license: unknown
library_name: peft
tags:
- mistral
datasets:
- ehartford/dolphin
- garage-bAInd/Open-Platypus
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1
mistral-7b-instruct-v0.1
General instruction-following llm finetuned from mistralai/Mistral-7B-v0.1.
Model Details
Model Description
This instruction-following llm was built via parameter-efficient QLoRA finetuning of mistralai/Mistral-7B-v0.1 on the first 200k rows of ehartford/dolphin. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 20 hours on Google Colab. Only the peft
adapter weights are included in this model repo, alonside the tokenizer.
- Developed by: Daniel Furman
- Model type: Decoder-only
- Language(s) (NLP): English
- License: Yi model license
- Finetuned from model: mistralai/Mistral-7B-v0.1
Model Sources
- Repository: github.com/daniel-furman/sft-demos
Evaluation Results
Metric | Value |
---|---|
MMLU (5-shot) | Coming |
ARC (25-shot) | Coming |
HellaSwag (10-shot) | Coming |
TruthfulQA (0-shot) | Coming |
Avg. | Coming |
We use Eleuther.AI's Language Model Evaluation Harness to run the benchmark tests above, the same version as Hugging Face's Open LLM Leaderboard.
Uses
Direct Use
[More Information Needed]
Downstream Use
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
[More Information Needed]
Preprocessing
[More Information Needed]
Training Hyperparameters
We used the SFTTrainer
from TRL library that gives a wrapper around transformers Trainer
to easily fine-tune models on instruction based datasets.
The following TrainingArguments
config was used:
- num_train_epochs = 1
- auto_find_batch_size = True
- gradient_accumulation_steps = 1
- optim = "paged_adamw_32bit"
- save_strategy = "epoch"
- learning_rate = 3e-4
- lr_scheduler_type = "cosine"
- warmup_ratio = 0.03
- logging_strategy = "steps"
- logging_steps = 25
- bf16 = True
The following bitsandbytes
quantization config was used:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: bfloat16
Speeds, Sizes, Times
runtime / 50 tokens (sec) | GPU | attn | torch dtype | VRAM (GB) |
---|---|---|---|---|
3.1 | 1x A100 (40 GB SXM) | torch | fp16 | 13 |
Model Card Contact
dryanfurman at gmail
Framework versions
- PEFT 0.6.0.dev0