dfurman's picture
Update README.md
ac01a3e
|
raw
history blame
4.69 kB
metadata
license: unknown
library_name: peft
tags:
  - mistral
datasets:
  - ehartford/dolphin
  - garage-bAInd/Open-Platypus
inference: false
pipeline_tag: text-generation
base_model: mistralai/Mistral-7B-v0.1

mistral-7b-instruct-v0.1

General instruction-following llm finetuned from mistralai/Mistral-7B-v0.1.

Model Details

Model Description

This instruction-following llm was built via parameter-efficient QLoRA finetuning of mistralai/Mistral-7B-v0.1 on the first 200k rows of ehartford/dolphin. Finetuning was executed on 1x A100 (40 GB SXM) for roughly 20 hours on Google Colab. Only the peft adapter weights are included in this model repo, alonside the tokenizer.

  • Developed by: Daniel Furman
  • Model type: Decoder-only
  • Language(s) (NLP): English
  • License: Yi model license
  • Finetuned from model: mistralai/Mistral-7B-v0.1

Model Sources

Evaluation Results

Metric Value
MMLU (5-shot) Coming
ARC (25-shot) Coming
HellaSwag (10-shot) Coming
TruthfulQA (0-shot) Coming
Avg. Coming

We use Eleuther.AI's Language Model Evaluation Harness to run the benchmark tests above, the same version as Hugging Face's Open LLM Leaderboard.

Uses

Direct Use

[More Information Needed]

Downstream Use

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Preprocessing

[More Information Needed]

Training Hyperparameters

We used the SFTTrainer from TRL library that gives a wrapper around transformers Trainer to easily fine-tune models on instruction based datasets.

The following TrainingArguments config was used:

  • num_train_epochs = 1
  • auto_find_batch_size = True
  • gradient_accumulation_steps = 1
  • optim = "paged_adamw_32bit"
  • save_strategy = "epoch"
  • learning_rate = 3e-4
  • lr_scheduler_type = "cosine"
  • warmup_ratio = 0.03
  • logging_strategy = "steps"
  • logging_steps = 25
  • bf16 = True

The following bitsandbytes quantization config was used:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: bfloat16

Speeds, Sizes, Times

runtime / 50 tokens (sec) GPU attn torch dtype VRAM (GB)
3.1 1x A100 (40 GB SXM) torch fp16 13

Model Card Contact

dryanfurman at gmail

Framework versions

  • PEFT 0.6.0.dev0