No longer available, why?
Whyyyyyyyyyyyyyyyyyyyyyyyyyy?
Whyyyyyyyyyyyyyyyyyyyyyyyyyy?
It costs ~30K USD / month to keep up the inference widget, so we decided to turn it off after the first month. Really sorry :(
You can of course still download the model and run it on your own hardware if you have the resources available.
oh no
I like it more than bloom
I like it more than bloom
Same
NOOOOOOOOOO 😭
:(
On the bright side mt0-xxl & mt0-xxl-mt can still be used via the inference widget. 🤗
Definitely share if you find them more / less useful & if so why 🧐
In my experiments I found them better at following instructions requiring short answers & worse at instructions requiring long answers.
:(
On the bright side mt0-xxl & mt0-xxl-mt can still be used via the inference widget. 🤗Definitely share if you find them more / less useful & if so why 🧐
In my experiments I found them better at following instructions requiring short answers & worse at instructions requiring long answers.
Bloomz know when to stop, Bloom don't.
I also found that Bloomz almost stopped too soon. When summarizing text, it ended after a single sentence. And since it only generated one sentence, it was never given the opportunity to follow the prompt. I honestly found Bloom more helpful. It could respond to longer prompts well, especially few shot prompts. But Bloomz seems to only work with short Q and A prompts. I do have hope that if it keeps getting better, Bloomz will become more diverse in capability.
I also found that Bloomz almost stopped too soon. When summarizing text, it ended after a single sentence. And since it only generated one sentence, it was never given the opportunity to follow the prompt. I honestly found Bloom more helpful. It could respond to longer prompts well, especially few shot prompts. But Bloomz seems to only work with short Q and A prompts. I do have hope that if it keeps getting better, Bloomz will become more diverse in capability.
Because of XP3 dataset i think. Most of the answers in this dataset are short.
Now you can run inference and fine-tune BLOOMZ (the 176B English version) using the Petals swarm.
You can use BLOOMZ via this Colab notebook to get the inference speed of 1-2 sec/token for a single sequence. Running the notebook on a local machine is also fine, you'd need only 10+ GB GPU memory or 12+ GB RAM (though it will be slower without a GPU).
Note: Don't forget to replace bigscience/bloom-petals
with bigscience/bloomz-petals
in the model name.
As an example, there is a chatbot app running BLOOMZ this way.
Bloomz is back and even stronger than before. You can now do token streaming:
pip install sseclient-py
(do NOT install sseclient
, be sure to install sseclient-py
)
import sseclient
import requests
prompt = "Why is the sky blue? Explain in a detailled paragraph."
parameters = {"max_new_tokens": 200, "top_p": 0.9, "seed": 0}
options = {"use_cache": False}
payload = {"inputs": prompt, "stream": True, "parameters": parameters, "options": options}
r = requests.post("https://api-inference.huggingface.co/models/bigscience/bloomz", stream=True, json=payload)
sse_client = sseclient.SSEClient(r)
for i, event in enumerate(sse_client.events()):
print(i, event.data)