metadata
license: apache-2.0
library_name: transformers
base_model:
- flammenai/flammen27-mistral-7B
datasets:
- flammenai/FlameMix-DPO-v1
- flammenai/Grill-preprod-v1_chatML
- flammenai/Grill-preprod-v2_chatML
model-index:
- name: Mahou-1.2a-mistral-7B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 68.77
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=flammenai/Mahou-1.2a-mistral-7B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 88.14
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=flammenai/Mahou-1.2a-mistral-7B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 64.73
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=flammenai/Mahou-1.2a-mistral-7B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 68.71
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=flammenai/Mahou-1.2a-mistral-7B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 80.11
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=flammenai/Mahou-1.2a-mistral-7B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 63.61
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=flammenai/Mahou-1.2a-mistral-7B
name: Open LLM Leaderboard
Mahou-1.2a-mistral-7B
Mahou is our attempt to build a production-ready conversational/roleplay LLM.
Future versions will be released iteratively and finetuned from flammen.ai conversational data.
1.2a is rebased and retrained to improve comphresion and coherence.
Chat Format
This model has been trained to use ChatML format.
<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>
Roleplay Format
- Speech without quotes.
- Actions in
*asterisks*
*leans against wall cooly* so like, i just casted a super strong spell at magician academy today, not gonna lie, felt badass.
ST Settings
- Use ChatML for the Context Template.
- Turn on Instruct Mode for ChatML.
- Use the following stopping strings:
["<", "|", "<|", "\n"]
Method
Finetuned using an A100 on Google Colab.
Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne
Configuration
LoRA, model, and training settings:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=2000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 72.35 |
AI2 Reasoning Challenge (25-Shot) | 68.77 |
HellaSwag (10-Shot) | 88.14 |
MMLU (5-Shot) | 64.73 |
TruthfulQA (0-shot) | 68.71 |
Winogrande (5-shot) | 80.11 |
GSM8k (5-shot) | 63.61 |