lucyknada commited on
Commit
75dfd4d
·
verified ·
1 Parent(s): 7d2039d

Upload ./README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - en
5
+ license: cc-by-nc-4.0
6
+ ---
7
+ ### exl3 quant
8
+ ---
9
+ ### check revisions for quants
10
+ ---
11
+
12
+
13
+ # Model Information
14
+
15
+ We introduce **Nemotron-UltraLong-8B**, a series of ultra-long context language models designed to process extensive sequences of text (up to 1M, 2M, and 4M tokens) while maintaining competitive performance on standard benchmarks. Built on the Llama-3.1, UltraLong-8B leverages a systematic training recipe that combines efficient continued pretraining with instruction tuning to enhance long-context understanding and instruction-following capabilities. This approach enables our models to efficiently scale their context windows without sacrificing general performance.
16
+
17
+
18
+ ## The UltraLong Models
19
+
20
+ - [nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct](https://huggingface.co/nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct)
21
+ - [nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct](https://huggingface.co/nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct)
22
+ - [nvidia/Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct](https://huggingface.co/nvidia/Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct)
23
+
24
+
25
+ ## Uses
26
+
27
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
28
+
29
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
30
+
31
+ ```python
32
+ import transformers
33
+ import torch
34
+
35
+ model_id = "nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct"
36
+
37
+ pipeline = transformers.pipeline(
38
+ "text-generation",
39
+ model=model_id,
40
+ model_kwargs={"torch_dtype": torch.bfloat16},
41
+ device_map="auto",
42
+ )
43
+
44
+ messages = [
45
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
46
+ {"role": "user", "content": "Who are you?"},
47
+ ]
48
+
49
+ outputs = pipeline(
50
+ messages,
51
+ max_new_tokens=256,
52
+ )
53
+ print(outputs[0]["generated_text"][-1])
54
+ ```
55
+
56
+ ## Model Card
57
+
58
+ * Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
59
+ * Continued Pretraining: The training data consists of 1B tokens sourced from a pretraining corpus using per-domain upsampling based on sample length. The model was trained for 125 iterations with a sequence length of 1M and a global batch size of 8.
60
+ * Supervised fine-tuning (SFT): 1B tokens on open-source instruction datasets across general, mathematics, and code domains. We subsample the data from the ‘general_sft_stage2’ from [AceMath-Instruct](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data).
61
+ * Maximum context window: 1M tokens
62
+
63
+ ## Evaluation Results
64
+
65
+ We evaluate Nemotron-UltraLong-8B on a diverse set of benchmarks, including long-context tasks (e.g., RULER, LV-Eval, and InfiniteBench) and standard tasks (e.g., MMLU, MATH, GSM-8K, and HumanEval). UltraLong-8B achieves superior performance on ultra-long context tasks while maintaining competitive results on standard benchmarks.
66
+
67
+ ### Needle in a Haystack
68
+
69
+ <img width="80%" alt="image" src="Llama-3.1-8B-UltraLong-1M-Instruct.png">
70
+
71
+ ### Long context evaluation
72
+
73
+ <img width="80%" alt="image" src="long_benchmark.png">
74
+
75
+ ### Standard capability evaluation
76
+
77
+ <img width="80%" alt="image" src="standard_benchmark.png">
78
+
79
+ ## Correspondence to
80
+ Chejian Xu ([email protected]), Wei Ping ([email protected])
81
+
82
+ ## Citation
83
+ <pre>
84
+ @article{ulralong2025,
85
+ title={From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models},
86
+ author={Xu, Chejian and Ping, Wei and Xu, Peng and Liu, Zihan and Wang, Boxin and Shoeybi, Mohammad and Catanzaro, Bryan},
87
+ journal={arXiv preprint},
88
+ year={2025}
89
+ }
90
+ </pre>