yang31210999 nielsr HF Staff commited on
Commit
93a1a5a
·
verified ·
1 Parent(s): b6b561a

Enhance model card with metadata, paper link, and basic usage (#1)

Browse files

- Enhance model card with metadata, paper link, and basic usage (400109fd74b565af86dd83d6be47b58b7051daf4)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +34 -6
README.md CHANGED
@@ -1,17 +1,45 @@
1
  ---
2
- datasets:
3
- - BAAI/Infinity-Instruct
 
4
  base_model:
5
  - nvidia/Llama-3.1-Minitron-4B-Depth-Base
 
 
6
  ---
7
 
8
- We fine-tune nvidia/Llama-3.1-Minitron-4B-Depth-Base with LLM-Neo methodwhich combines LoRA and KD in one. Training data is sampling from BAAI/Infinity-Instruct for 100k lines.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
 
 
 
10
 
 
11
 
12
- ## Benchmarks
13
 
14
- In this section, we report the results for Llama-3.1-Minitron-4B-Depth-Neo-10w on standard automatic benchmarks. For all the evaluations, we use [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library.
15
 
16
  ### Evaluation results
17
 
@@ -136,4 +164,4 @@ In this section, we report the results for Llama-3.1-Minitron-4B-Depth-Neo-10w o
136
  <td>0.3548</td>
137
  <td>± 0.0044</td>
138
  </tr>
139
- </table>
 
1
  ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  base_model:
6
  - nvidia/Llama-3.1-Minitron-4B-Depth-Base
7
+ datasets:
8
+ - BAAI/Infinity-Instruct
9
  ---
10
 
11
+ We fine-tune `nvidia/Llama-3.1-Minitron-4B-Depth-Base` with the LLM-Neo method, which combines LoRA and KD. Training data is sampled from `BAAI/Infinity-Instruct` for 100k lines.
12
+
13
+ This repository contains the model described in the paper [LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models](https://hf.co/papers/2411.06839).
14
+ The project page is available [here](https://huggingface.co/collections/yang31210999/llm-neo-66e3c882f5579b829ff57eba) and the Github repository is available [here](https://github.com/yang3121099/LLM-Neo).
15
+
16
+ ## Basic Usage
17
+
18
+ This example demonstrates generating text using the model. You'll need to install the necessary libraries first: `pip install transformers`.
19
+
20
+ ```python
21
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
22
+ import torch
23
+
24
+ model_path = "yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-10w"
25
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
26
+ model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16)
27
+
28
+ prompt = "Once upon a time"
29
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
30
+ generation_config = GenerationConfig(
31
+ max_new_tokens=50, do_sample=True, temperature=0.7
32
+ )
33
 
34
+ outputs = model.generate(**inputs, generation_config=generation_config)
35
+ generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
36
+ print(generated_text)
37
 
38
+ ```
39
 
40
+ ## Benchmarks
41
 
42
+ In this section, we report the results for `Llama-3.1-Minitron-4B-Depth-Neo-10w` on standard automatic benchmarks. For all the evaluations, we use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library.
43
 
44
  ### Evaluation results
45
 
 
164
  <td>0.3548</td>
165
  <td>± 0.0044</td>
166
  </tr>
167
+ </table>