Credits: Maxime Labonne https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac
(With minor alterations)
NeuralHermes 2.5 - Mistral 7B
NeuralHermes is based on the teknium/OpenHermes-2.5-Mistral-7B model that has been further fine-tuned with Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs dataset. .
Usage
You can run this model using the following code:
import transformers
from transformers import AutoTokenizer
# Format prompt
message = [
{"role": "system", "content": "You are a helpful assistant chatbot."},
{"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
# Create pipeline
pipeline = transformers.pipeline(
"text-generation",
model=new_model,
tokenizer=tokenizer
)
# Generate text
sequences = pipeline(
prompt,
do_sample=True,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
max_length=200,
)
print(sequences[0]['generated_text'])
Training hyperparameters
LoRA:
- r=16
- lora_alpha=16
- lora_dropout=0.05
- bias="none"
- task_type="CAUSAL_LM"
- target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
Training arguments:
- per_device_train_batch_size=2 # Changed from 4
- gradient_accumulation_steps=4
- gradient_checkpointing=True
- learning_rate=2e-5 # Changed from 5e-5
- lr_scheduler_type="cosine"
- max_steps=250 # Changed from 200
- optim="paged_adamw_32bit"
- warmup_steps=100
DPOTrainer:
- beta=0.1
- max_prompt_length=1024
- max_length=1536
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for delayedkarma/NeuralHermes-2.5-Mistral-7B
Base model
mistralai/Mistral-7B-v0.1
Finetuned
teknium/OpenHermes-2.5-Mistral-7B