mlx-community/dbrx-instruct-4bit
This model was converted to MLX format from databricks/dbrx-instruct
using mlx-lm version b80adbc
after DBRX support was added by Awni Hannun.
Refer to the original model card for more details on the model.
Conversion
Conversion was done with:
python -m mlx_lm.convert --hf-path databricks/dbrx-instruct -q --upload-repo mlx-community/dbrx-instruct-4bit
Use with mlx
Make you you first upgrade mlx-lm and mlx to the latest.
pip install mlx --upgrade
pip install mlx-lm --upgrade
python -m mlx_lm.generate --model mlx-community/dbrx-instruct-4bit --prompt "Hello" --trust-remote-code --use-default-chat-template --max-tokens 500
Remember, this is an Instruct model, so you will need to use the instruct prompt template by appending --use-default-chat-template
Example:
python -m mlx_lm.generate --model dbrx-instruct-4bit --prompt "What's the difference between PCA vs UMAP vs t-SNE?" --trust-remote-code --use-default-chat-template --max-tokens 1000
Output:
On my Macbook Pro M2 with 96GB of Unified Memory, DBRX Instruct in 4-bit for the above prompt it eats 70.2GB of RAM.
if the mlx-lm package was updated it can also be installed from pip:
pip install mlx-lm
To use it from Python you can do the following:
from mlx_lm import load, generate
model, tokenizer = load(
"mlx-community/dbrx-instruct-4bit",
tokenizer_config={"trust_remote_code": True}
)
chat = [
{"role": "user", "content": "What's the difference between PCA vs UMAP vs t-SNE?"},
# We need to add the Assistant role as well, otherwise mlx_lm will error on generation.
{"role": "assistant", "content": "The "},
]
prompt = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.6, max_tokens=1500)
Converted and uploaded by eek
- Downloads last month
- 27