--- license: apache-2.0 library_name: peft tags: - mistral datasets: - jondurbin/airoboros-2.2.1 inference: false pipeline_tag: text-generation base_model: mistralai/Mistral-7B-v0.1 ---
# Mistral-7B-Instruct-v0.1 The Mistral-7B-Instruct-v0.1 LLM is a pretrained generative text model with 7 billion parameters geared towards instruction-following capabilities. ## Model Details This model was built via parameter-efficient finetuning of the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) base model on the [jondurbin/airoboros-2.2.1](https://huggingface.co/datasets/jondurbin/airoboros-2.2.1) dataset. - **Developed by:** Daniel Furman - **Model type:** Decoder-only - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) ## Model Sources - **Repository:** [here](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/mistral/sft_Mistral_7B_Instruct_v0_1_peft.ipynb) ## Evaluation Results | Metric | Value | |-----------------------|-------| | MMLU (5-shot) | Coming | | ARC (25-shot) | Coming | | HellaSwag (10-shot) | Coming | | TruthfulQA (0-shot) | Coming | | Avg. | Coming | We use Eleuther.AI's [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) to run the benchmark tests above, the same version as Hugging Face's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). ## Basic Usage
Setup ```python !pip install -q -U transformers peft torch accelerate einops sentencepiece ``` ```python import torch from peft import PeftModel, PeftConfig from transformers import ( AutoModelForCausalLM, AutoTokenizer, ) ``` ```python peft_model_id = "dfurman/Mistral-7B-Instruct-v0.1" config = PeftConfig.from_pretrained(peft_model_id) tokenizer = AutoTokenizer.from_pretrained( peft_model_id, use_fast=True, trust_remote_code=True, ) model = AutoModelForCausalLM.from_pretrained( config.base_model_name_or_path, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained( model, peft_model_id ) ```
```python messages = [ {"role": "user", "content": "Tell me a recipe for a mai tai."}, ] print("\n\n*** Prompt:") input_ids = tokenizer.apply_chat_template( messages, tokenize=True, return_tensors="pt", ) print(tokenizer.decode(input_ids[0])) ```
Prompt ```python " [INST] Tell me a recipe for a mai tai. [/INST]" ```
```python print("\n\n*** Generate:") with torch.autocast("cuda", dtype=torch.bfloat16): output = model.generate( input_ids=input_ids.cuda(), max_new_tokens=1024, do_sample=True, temperature=0.7, return_dict_in_generate=True, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id, repetition_penalty=1.2, no_repeat_ngram_size=5, ) response = tokenizer.decode( output["sequences"][0][len(input_ids[0]):], skip_special_tokens=True ) print(response) ```
Generation ```python """1 oz light rum ½ oz dark rum ¼ oz orange curaçao 2 oz pineapple juice ¾ oz lime juice Dash of orgeat syrup (optional) Splash of grenadine (for garnish, optional) Lime wheel and cherry garnishes (optional) Shake all ingredients except the splash of grenadine in a cocktail shaker over ice. Strain into an old-fashioned glass filled with fresh ice cubes. Gently pour the splash of grenadine down the side of the glass so that it sinks to the bottom. Add garnishes as desired.""" ```
## Speeds, Sizes, Times | runtime / 50 tokens (sec) | GPU | dtype | VRAM (GB) | |:-----------------------------:|:---------------------:|:-------------:|:-----------------------:| | 3.44 | 1x A100 (40 GB SXM) | torch.float16 | 16 | ## Training It took ~2 hours to train 2 epochs on 1x A100 (40 GB SXM). ### Prompt Format This model was finetuned with the following format: ```python tokenizer.chat_template = "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST] ' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}" ``` This format is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating) via the `apply_chat_template()` method. Here's an illustrative example: ```python messages = [ {"role": "user", "content": "Tell me a recipe for a mai tai."}, {"role": "assistant", "content": "1 oz light rum\n½ oz dark rum\n¼ oz orange curaçao\n2 oz pineapple juice\n¾ oz lime juice\nDash of orgeat syrup (optional)\nSplash of grenadine (for garnish, optional)\nLime wheel and cherry garnishes (optional)\n\nShake all ingredients except the splash of grenadine in a cocktail shaker over ice. Strain into an old-fashioned glass filled with fresh ice cubes. Gently pour the splash of grenadine down the side of the glass so that it sinks to the bottom. Add garnishes as desired."}, {"role": "user", "content": "How can I make it more upscale and luxurious?"}, ] print("\n\n*** Prompt:") input_ids = tokenizer.apply_chat_template( messages, tokenize=True, return_tensors="pt", ) print(tokenizer.decode(input_ids[0])) ```
Output ```python [INST] Tell me a recipe for a mai tai. [/INST] 1 oz light rum\n½ oz dark rum\n¼ oz orange curaçao\n2 oz pineapple juice\n¾ oz lime juice\nDash of orgeat syrup (optional)\nSplash of grenadine (for garnish, optional)\nLime wheel and cherry garnishes (optional)\n\nShake all ingredients except the splash of grenadine in a cocktail shaker over ice. Strain into an old-fashioned glass filled with fresh ice cubes. Gently pour the splash of grenadine down the side of the glass so that it sinks to the bottom. Add garnishes as desired. [INST] How can I make it more upscale and luxurious? [/INST] ```
### Training Hyperparameters We use the [SFTTrainer](https://huggingface.co/docs/trl/main/en/sft_trainer) from `trl` to fine-tune LLMs on instruction-following datasets. See [here](https://github.com/daniel-furman/sft-demos/blob/main/src/sft/mistral/sft_Mistral_7B_Instruct_v0_1_peft.ipynb) for the finetuning code, which contains an exhaustive view of the hyperparameters employed. The following `TrainingArguments` config was used: - output_dir = "./results" - num_train_epochs = 3 - auto_find_batch_size = True - gradient_accumulation_steps = 1 - optim = "paged_adamw_32bit" - save_strategy = "epoch" - learning_rate = 3e-4 - lr_scheduler_type = "cosine" - warmup_ratio = 0.03 - logging_strategy = "steps" - logging_steps = 25 - evaluation_strategy = "epoch" - prediction_loss_only = True - bf16 = True The following `bitsandbytes` quantization config was used: - quant_method: bitsandbytes - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: False - bnb_4bit_compute_dtype: bfloat16 ## Model Card Contact dryanfurman at gmail ### Framework versions - PEFT 0.6.3.dev0