yang31210999's picture
Enhance model card with metadata, paper link, and basic usage (#1)
93a1a5a verified
metadata
license: mit
library_name: transformers
pipeline_tag: text-generation
base_model:
  - nvidia/Llama-3.1-Minitron-4B-Depth-Base
datasets:
  - BAAI/Infinity-Instruct

We fine-tune nvidia/Llama-3.1-Minitron-4B-Depth-Base with the LLM-Neo method, which combines LoRA and KD. Training data is sampled from BAAI/Infinity-Instruct for 100k lines.

This repository contains the model described in the paper LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models. The project page is available here and the Github repository is available here.

Basic Usage

This example demonstrates generating text using the model. You'll need to install the necessary libraries first: pip install transformers.

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

model_path = "yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-10w"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16)

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
    max_new_tokens=50, do_sample=True, temperature=0.7
)

outputs = model.generate(**inputs, generation_config=generation_config)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(generated_text)

Benchmarks

In this section, we report the results for Llama-3.1-Minitron-4B-Depth-Neo-10w on standard automatic benchmarks. For all the evaluations, we use the lm-evaluation-harness library.

Evaluation results

Category Benchmark Version n-shot Metric Value Stderr
BBH BBH (General) N/A 3 exact_match 0.4729 ± 0.0055
BBH (Boolean Expressions) 2 3 exact_match 0.8120 ± 0.0248
BBH (Date Understanding) 2 3 exact_match 0.6600 ± 0.0300
CEVAL CEVAL (General) N/A 0 acc 0.4413 ± 0.0135
CEVAL (Accountant) 1 0 acc 0.3469 ± 0.0687
CEVAL (Advanced Mathematics) 1 0 acc 0.4737 ± 0.1177
CEVAL (Art Studies) 1 0 acc 0.4545 ± 0.0880
MMLU MMLU (General) N/A 0 acc 0.6048 ± 0.0039
MMLU (Humanities) N/A 0 acc 0.5552 ± 0.0067
MMLU (STEM) N/A 0 acc 0.5214 ± 0.0086
CMMLU CMMLU (General) N/A 0 acc 0.3548 ± 0.0044
CMMLU (Normalized) N/A 0 acc_norm 0.3548 ± 0.0044