Enhance model card with metadata, paper link, and basic usage (#1)
Browse files- Enhance model card with metadata, paper link, and basic usage (400109fd74b565af86dd83d6be47b58b7051daf4)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,17 +1,45 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
|
|
4 |
base_model:
|
5 |
- nvidia/Llama-3.1-Minitron-4B-Depth-Base
|
|
|
|
|
6 |
---
|
7 |
|
8 |
-
We fine-tune nvidia/Llama-3.1-Minitron-4B-Depth-Base with LLM-Neo method
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
|
|
|
|
|
|
10 |
|
|
|
11 |
|
12 |
-
## Benchmarks
|
13 |
|
14 |
-
In this section, we report the results for Llama-3.1-Minitron-4B-Depth-Neo-10w on standard automatic benchmarks. For all the evaluations, we use [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library.
|
15 |
|
16 |
### Evaluation results
|
17 |
|
@@ -136,4 +164,4 @@ In this section, we report the results for Llama-3.1-Minitron-4B-Depth-Neo-10w o
|
|
136 |
<td>0.3548</td>
|
137 |
<td>± 0.0044</td>
|
138 |
</tr>
|
139 |
-
</table>
|
|
|
1 |
---
|
2 |
+
license: mit
|
3 |
+
library_name: transformers
|
4 |
+
pipeline_tag: text-generation
|
5 |
base_model:
|
6 |
- nvidia/Llama-3.1-Minitron-4B-Depth-Base
|
7 |
+
datasets:
|
8 |
+
- BAAI/Infinity-Instruct
|
9 |
---
|
10 |
|
11 |
+
We fine-tune `nvidia/Llama-3.1-Minitron-4B-Depth-Base` with the LLM-Neo method, which combines LoRA and KD. Training data is sampled from `BAAI/Infinity-Instruct` for 100k lines.
|
12 |
+
|
13 |
+
This repository contains the model described in the paper [LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models](https://hf.co/papers/2411.06839).
|
14 |
+
The project page is available [here](https://huggingface.co/collections/yang31210999/llm-neo-66e3c882f5579b829ff57eba) and the Github repository is available [here](https://github.com/yang3121099/LLM-Neo).
|
15 |
+
|
16 |
+
## Basic Usage
|
17 |
+
|
18 |
+
This example demonstrates generating text using the model. You'll need to install the necessary libraries first: `pip install transformers`.
|
19 |
+
|
20 |
+
```python
|
21 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
|
22 |
+
import torch
|
23 |
+
|
24 |
+
model_path = "yang31210999/Llama-3.1-Minitron-4B-Depth-Neo-10w"
|
25 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
26 |
+
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16)
|
27 |
+
|
28 |
+
prompt = "Once upon a time"
|
29 |
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
30 |
+
generation_config = GenerationConfig(
|
31 |
+
max_new_tokens=50, do_sample=True, temperature=0.7
|
32 |
+
)
|
33 |
|
34 |
+
outputs = model.generate(**inputs, generation_config=generation_config)
|
35 |
+
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
|
36 |
+
print(generated_text)
|
37 |
|
38 |
+
```
|
39 |
|
40 |
+
## Benchmarks
|
41 |
|
42 |
+
In this section, we report the results for `Llama-3.1-Minitron-4B-Depth-Neo-10w` on standard automatic benchmarks. For all the evaluations, we use the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) library.
|
43 |
|
44 |
### Evaluation results
|
45 |
|
|
|
164 |
<td>0.3548</td>
|
165 |
<td>± 0.0044</td>
|
166 |
</tr>
|
167 |
+
</table>
|