metadata

language:
  - en
license: apache-2.0
datasets:
  - HuggingFaceH4/no_robots
base_model: mistralai/Mistral-7B-v0.1
pipeline_tag: text-generation
thumbnail: >-
  https://huggingface.co/mrm8488/mistral-7b-ft-h4-no_robots_instructions/resolve/main/mistralh4-removebg-preview.png?download=true
model-index:
  - name: mistral-7b-ft-h4-no_robots_instructions
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 60.92
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mrm8488/mistral-7b-ft-h4-no_robots_instructions
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 83.17
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mrm8488/mistral-7b-ft-h4-no_robots_instructions
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 63.37
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mrm8488/mistral-7b-ft-h4-no_robots_instructions
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 43.63
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mrm8488/mistral-7b-ft-h4-no_robots_instructions
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 78.85
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mrm8488/mistral-7b-ft-h4-no_robots_instructions
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 37
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mrm8488/mistral-7b-ft-h4-no_robots_instructions
          name: Open LLM Leaderboard

Mistral 7B fine-tuned on H4/No Robots instructions

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the HuggingFaceH4/no_robots dataset for instruction following downstream task.

Training procedure

The model was loaded on 8 bits and fine-tuned on the LIMA dataset using the LoRA PEFT technique with the huggingface/peft library and trl/sft for one epoch on 1 x A100 (40GB) GPU.

SFT Trainer params:

trainer = SFTTrainer(
    model=model,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=2048,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=False
)

LoRA config:

config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules = ['q_proj', 'k_proj', 'down_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj']
    )

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 8
seed: 66
gradient_accumulation_steps: 64
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Step	Training Loss	Validation Loss
10	1.796200	1.774305
20	1.769700	1.679720
30	1.626800	1.667754
40	1.663400	1.665188
50	1.565700	1.659000
60	1.660300	1.658270

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

repo_id = "mrm8488/mistral-7b-ft-h4-no_robots_instructions"

model = AutoModelForCausalLM.from_pretrained(repo_id, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

gen = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)

instruction = "[INST] Write an email to say goodbye to me boss [\INST]"
res = gen(instruction, max_new_tokens=512, temperature=0.3, top_p=0.75, top_k=40, repetition_penalty=1.2, eos_token_id=2)
print(res[0]['generated_text'])

Framework versions

Transformers 4.35.0.dev0
Pytorch 2.1.0+cu118
Datasets 2.14.6
Tokenizers 0.14.1

Citation

@misc {manuel_romero_2023,
    author       = { {Manuel Romero} },
    title        = { mistral-7b-ft-h4-no_robots_instructions (Revision 785446d) },
    year         = 2023,
    url          = { https://huggingface.co/mrm8488/mistral-7b-ft-h4-no_robots_instructions },
    doi          = { 10.57967/hf/1426 },
    publisher    = { Hugging Face }
}

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	61.16
AI2 Reasoning Challenge (25-Shot)	60.92
HellaSwag (10-Shot)	83.17
MMLU (5-Shot)	63.37
TruthfulQA (0-shot)	43.63
Winogrande (5-shot)	78.85
GSM8k (5-shot)	37.00