Training
- 8x A6000s
- Forked version of unsloth for efficient training
- Sequence Length: 4096
- Effective batch size: 128
- Learning Rate: 2e-5 with linear decay
- Epochs: 1
- Base model trained with QLoRA (rank 64, alpha 16) and MoE adapters/routers trained in bf16
- Num Experts: 16
- Top K: 4
- Adapter Dim: 512
Prompt Format
<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2", device_map="auto", trust_remote_code=True).eval()
system_str = "<|im_start|>system\n{message}<|im_end|>\n"
user_str = "<|im_start|>user\n{message}<|im_end|>\n"
assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"
def construct_prompt(messages):
prompt = ""
for message in messages:
if message["from"] in ["human", "user"]:
prompt += user_str.format(
message=message["value"]
)
elif message["from"] in ["gpt", "assistant"]:
prompt += assistant_str.format(
message=message["value"]
)
elif message["from"] in ["system", "instruction"]:
prompt += system_str.format(
message=message["value"]
)
else:
raise ValueError(
f"Unknown message type: {message['from']}"
)
return prompt + "<|im_start|>assistant\n"
system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
user = "Are you sentient?"
messages = [
{"from": "system", "value": system},
{"from": "user", "value": user},
]
prompt = construct_prompt(messages)
inputs = tokenizer(prompt, return_tensors="pt")
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
Other Information
Paper reference: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Forked repo with mistral support (sparsetral)
If you are interested in faster inferencing, check out our fork of vLLM that adds sparsetral support
- Downloads last month
- 221
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.