Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,7 @@ The model underwent a multi-phase post-training process to enhance both its reas
|
|
27 |
|
28 |
This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
|
29 |
- [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
|
|
|
30 |
|
31 |
This model is ready for commercial use.
|
32 |
|
@@ -95,6 +96,7 @@ Llama-3.3-Nemotron-Super-49B-v1 is a general purpose reasoning and chat model in
|
|
95 |
|
96 |
You can try this model out through the preview API, using this link: [Llama-3_3-Nemotron-Super-49B-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1).
|
97 |
|
|
|
98 |
See the snippet below for usage with [Hugging Face Transformers](https://huggingface.co/docs/transformers/main/en/index) library. Reasoning mode (ON/OFF) is controlled via system prompt. Please see the example below
|
99 |
|
100 |
We recommend using the *transformers* package with version 4.48.3.
|
@@ -150,6 +152,26 @@ thinking = "off"
|
|
150 |
print(pipeline([{"role": "system", "content": f"detailed thinking {thinking}"},{"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))
|
151 |
```
|
152 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
153 |
## Inference:
|
154 |
|
155 |
**Engine:**
|
|
|
27 |
|
28 |
This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
|
29 |
- [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
|
30 |
+
- [Llama-3.1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1)
|
31 |
|
32 |
This model is ready for commercial use.
|
33 |
|
|
|
96 |
|
97 |
You can try this model out through the preview API, using this link: [Llama-3_3-Nemotron-Super-49B-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1).
|
98 |
|
99 |
+
### Use It with Transformers
|
100 |
See the snippet below for usage with [Hugging Face Transformers](https://huggingface.co/docs/transformers/main/en/index) library. Reasoning mode (ON/OFF) is controlled via system prompt. Please see the example below
|
101 |
|
102 |
We recommend using the *transformers* package with version 4.48.3.
|
|
|
152 |
print(pipeline([{"role": "system", "content": f"detailed thinking {thinking}"},{"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))
|
153 |
```
|
154 |
|
155 |
+
### Use It with vLLM
|
156 |
+
|
157 |
+
```
|
158 |
+
pip install vllm==0.8.3
|
159 |
+
```
|
160 |
+
An example on how to serve with vLLM:
|
161 |
+
```
|
162 |
+
python3 -m vllm.entrypoints.openai.api_server \
|
163 |
+
--model "nvidia/Llama-3_3-Nemotron-Super-49B-v1" \
|
164 |
+
--trust-remote-code \
|
165 |
+
--seed=1 \
|
166 |
+
--host="0.0.0.0" \
|
167 |
+
--port=5000 \
|
168 |
+
--served-model-name "nvidia/Llama-3_3-Nemotron-Super-49B-v1" \
|
169 |
+
--tensor-parallel-size=8 \
|
170 |
+
--max-model-len=32768 \
|
171 |
+
--gpu-memory-utilization 0.95 \
|
172 |
+
--enforce-eager
|
173 |
+
```
|
174 |
+
|
175 |
## Inference:
|
176 |
|
177 |
**Engine:**
|