jiaqiz commited on
Commit
4605d82
·
verified ·
1 Parent(s): 1a2cb80

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -27,6 +27,7 @@ The model underwent a multi-phase post-training process to enhance both its reas
27
 
28
  This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
29
  - [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
 
30
 
31
  This model is ready for commercial use.
32
 
@@ -95,6 +96,7 @@ Llama-3.3-Nemotron-Super-49B-v1 is a general purpose reasoning and chat model in
95
 
96
  You can try this model out through the preview API, using this link: [Llama-3_3-Nemotron-Super-49B-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1).
97
 
 
98
  See the snippet below for usage with [Hugging Face Transformers](https://huggingface.co/docs/transformers/main/en/index) library. Reasoning mode (ON/OFF) is controlled via system prompt. Please see the example below
99
 
100
  We recommend using the *transformers* package with version 4.48.3.
@@ -150,6 +152,26 @@ thinking = "off"
150
  print(pipeline([{"role": "system", "content": f"detailed thinking {thinking}"},{"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))
151
  ```
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  ## Inference:
154
 
155
  **Engine:**
 
27
 
28
  This model is part of the Llama Nemotron Collection. You can find the other model(s) in this family here:
29
  - [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)
30
+ - [Llama-3.1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1)
31
 
32
  This model is ready for commercial use.
33
 
 
96
 
97
  You can try this model out through the preview API, using this link: [Llama-3_3-Nemotron-Super-49B-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1).
98
 
99
+ ### Use It with Transformers
100
  See the snippet below for usage with [Hugging Face Transformers](https://huggingface.co/docs/transformers/main/en/index) library. Reasoning mode (ON/OFF) is controlled via system prompt. Please see the example below
101
 
102
  We recommend using the *transformers* package with version 4.48.3.
 
152
  print(pipeline([{"role": "system", "content": f"detailed thinking {thinking}"},{"role": "user", "content": "Solve x*(sin(x)+2)=0"}]))
153
  ```
154
 
155
+ ### Use It with vLLM
156
+
157
+ ```
158
+ pip install vllm==0.8.3
159
+ ```
160
+ An example on how to serve with vLLM:
161
+ ```
162
+ python3 -m vllm.entrypoints.openai.api_server \
163
+ --model "nvidia/Llama-3_3-Nemotron-Super-49B-v1" \
164
+ --trust-remote-code \
165
+ --seed=1 \
166
+ --host="0.0.0.0" \
167
+ --port=5000 \
168
+ --served-model-name "nvidia/Llama-3_3-Nemotron-Super-49B-v1" \
169
+ --tensor-parallel-size=8 \
170
+ --max-model-len=32768 \
171
+ --gpu-memory-utilization 0.95 \
172
+ --enforce-eager
173
+ ```
174
+
175
  ## Inference:
176
 
177
  **Engine:**