Update README.md
Browse files
README.md
CHANGED
@@ -165,8 +165,6 @@ More details on the datasets and synthetic data generation methods can be found
|
|
165 |
|
166 |
## Private Non-publicly Accessible Datasets of Third Parties
|
167 |
|
168 |
-
\*List of 'main/large' private data sets acquired from other third parties (above 5% of the overall data in this category), unique identifiers and links (if available) and narrative description \+ period of collection\*
|
169 |
-
|
170 |
| Dataset |
|
171 |
| :---- |
|
172 |
| Global Regulation |
|
@@ -314,25 +312,25 @@ We evaluated our model on the following benchmarks:
|
|
314 |
| Task | N-Nano-V2 12B Base | | N-Nano-V2 9B Base | Qwen3 8B Base | Gemma3 12B Base |
|
315 |
| :---- | :---- | :---- | :---- | :---- | :---- |
|
316 |
| **General** | | | | | |
|
317 |
-
| MMLU | 78.24 | | 74.53 | 76.44 | 73.61 |
|
318 |
-
| MMLU-Pro 5-shot | 63.98 | |
|
319 |
-
| AGIEval English CoT | 68.03 | |
|
320 |
-
| Math | | | | | |
|
321 |
-
| GSM8K CoT | 91.66 | |
|
322 |
-
|
|
323 |
-
| MATH Level 5 | 67.61 | | **63.64** | 29.91 | 17.71 |
|
324 |
-
| AIME 2024 avg@32 | 56.67 | | 30.00 | 20.00 | 16.67 |
|
325 |
| **Code** | | | | | |
|
326 |
-
| HumanEval+ Pass@1 | 61.03 | | 58.50 | 57.55 | 36.68 |
|
327 |
-
| MBPP+ Pass@1 | 61.55 | | 58.95 | 58.56 | 51.73 |
|
328 |
| **Commonsense Understanding** | | | | | |
|
329 |
-
| ARC Challenge | 93.26 | | 90.70 |
|
330 |
| HellaSwag | 84.00 | | 79.90 | 79.75 | **84.15** |
|
331 |
-
| OpenBookQA | 46.00 | | 44.80 | 42.00 | **46.00** |
|
332 |
-
| PIQA | 82.54 | | 81.83 | 79.43 |
|
333 |
| WinoGrande | 79.24 | | 75.30 | 75.93 | **79.95** |
|
334 |
| **Long Context** | | | | | |
|
335 |
-
| RULER-128K | 84.74 | |
|
336 |
|
337 |
*Table 1: Accuracy of Nemotron-Nano-V2-Base models versus existing SoTA models. N-Nano-V2 is short for Nemotron-Nano-V2. The distilled N-Nano-V2-9B-Base is compared against Qwen3-8B-Base and Gemma3-12B-Base, and the best score is highlighted in each row.*
|
338 |
|
@@ -340,20 +338,20 @@ We evaluated our model on the following benchmarks:
|
|
340 |
| :---- | :---- | :---- | :---- | :---- | :---- |
|
341 |
| **Global-MMLU-Lite** | | | | | |
|
342 |
| German | 74.50 | | 68.25 | **75.50** | 69.75 |
|
343 |
-
| Spanish | 76.50 | | 72.75 |
|
344 |
-
| French | 78.25 | | 69.75 |
|
345 |
-
| Italian | 76.50 | | 73.25 | 72.75 |
|
346 |
| Japanese | 71.00 | | 67.00 | 70.00 | **71.50** |
|
347 |
-
| Korean | 72.50 | | 67.25 | 67.25 |
|
348 |
-
| Portuguese | 76.25 | | 71.25 | 72.50 |
|
349 |
-
| Chinese | 75.50 | | 69.25 |
|
350 |
-
| Average | 75.13 | | 69.94 |
|
351 |
| **Multilingual Math (MGSM)** | | | | | |
|
352 |
-
| Spanish | 93.20 | |
|
353 |
-
| German | 89.60 | |
|
354 |
-
| French | 86.40 | |
|
355 |
| Chinese | 44.40 | | **75.20** | 28.80 | 26.80 |
|
356 |
-
| Japanese | 76.00 | |
|
357 |
| Russian | 90.40 | | **91.60** | 83.60 | 76.00 |
|
358 |
| Average | 80.00 | | **84.80** | 64.53 | 57.13 |
|
359 |
|
@@ -433,4 +431,5 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
|
|
433 |
| Description of any methods implemented in data acquisition or processing, if any, to address illegal or harmful content in the training data, including, but not limited to, child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) | We used a Gemma-3 4B-based guard model trained on [Nemotron Content Safety Dataset v2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) for content safety to exclude potentially illegal or harmful content from the training. |
|
434 |
| Use Case Restrictions: | GA: Abide by the [NVIDIA Open Model License Agreement](https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.nvidia.com%2Fen-us%2Fagreements%2Fenterprise-software%2Fnvidia-open-model-license%2F&data=05%7C02%7Cysuhara%40nvidia.com%7C72ec0b4887a44a71c85808ddda01c000%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638906423286956339%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=45LwrIpNjVPgKSqFQ3p6e4B%2BoRQoGFoWQenWUhimPok%3D&reserved=0). |
|
435 |
| Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
|
436 |
-
| This AI model was developed based on our policies to ensure responsible data handling and risk mitigation. The datasets used for training have been scanned for harmful content and illegal content, consistent with our policies including scanning for Child Sexual Abuse Material (CSAM). Ongoing review and monitoring mechanisms are in place based on our policies and to maintain data integrity. | True. We use [Nemotron Content Safety Dataset V2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) and an internal safety dataset specialized for minority sexuality for content safety evaluation to ensure the safety of this model. |
|
|
|
|
165 |
|
166 |
## Private Non-publicly Accessible Datasets of Third Parties
|
167 |
|
|
|
|
|
168 |
| Dataset |
|
169 |
| :---- |
|
170 |
| Global Regulation |
|
|
|
312 |
| Task | N-Nano-V2 12B Base | | N-Nano-V2 9B Base | Qwen3 8B Base | Gemma3 12B Base |
|
313 |
| :---- | :---- | :---- | :---- | :---- | :---- |
|
314 |
| **General** | | | | | |
|
315 |
+
| MMLU | **78.24** | | 74.53 | 76.44 | 73.61 |
|
316 |
+
| MMLU-Pro 5-shot | **63.98** | | 59.43 | 56.27 | 45.12 |
|
317 |
+
| AGIEval English CoT | **68.03** | | 65.28 | 59.54 | 51.69 |
|
318 |
+
| **Math** | | | | | |
|
319 |
+
| GSM8K CoT | **91.66** | | 91.36 | 84.00 | 74.45 |
|
320 |
+
| Math | **83.54** | | 80.50 | 55.40 | 42.40 |
|
321 |
+
| MATH Level 5 | **67.61** | | **63.64** | 29.91 | 17.71 |
|
322 |
+
| AIME 2024 avg@32 | **56.67** | | 30.00 | 20.00 | 16.67 |
|
323 |
| **Code** | | | | | |
|
324 |
+
| HumanEval+ Pass@1 | **61.03** | | 58.50 | 57.55 | 36.68 |
|
325 |
+
| MBPP+ Pass@1 | **61.55** | | 58.95 | 58.56 | 51.73 |
|
326 |
| **Commonsense Understanding** | | | | | |
|
327 |
+
| ARC Challenge | **93.26** | | 90.70 | 93.09 | 90.44 |
|
328 |
| HellaSwag | 84.00 | | 79.90 | 79.75 | **84.15** |
|
329 |
+
| OpenBookQA | **46.00** | | 44.80 | 42.00 | **46.00** |
|
330 |
+
| PIQA | **82.54** | | 81.83 | 79.43 | 82.10 |
|
331 |
| WinoGrande | 79.24 | | 75.30 | 75.93 | **79.95** |
|
332 |
| **Long Context** | | | | | |
|
333 |
+
| RULER-128K | **84.74** | | 82.22 | \- | 80.70 |
|
334 |
|
335 |
*Table 1: Accuracy of Nemotron-Nano-V2-Base models versus existing SoTA models. N-Nano-V2 is short for Nemotron-Nano-V2. The distilled N-Nano-V2-9B-Base is compared against Qwen3-8B-Base and Gemma3-12B-Base, and the best score is highlighted in each row.*
|
336 |
|
|
|
338 |
| :---- | :---- | :---- | :---- | :---- | :---- |
|
339 |
| **Global-MMLU-Lite** | | | | | |
|
340 |
| German | 74.50 | | 68.25 | **75.50** | 69.75 |
|
341 |
+
| Spanish | **76.50** | | 72.75 | 75.00 | 74.00 |
|
342 |
+
| French | **78.25** | | 69.75 | 74.25 | 72.50 |
|
343 |
+
| Italian | **76.50** | | 73.25 | 72.75 | 74.00 |
|
344 |
| Japanese | 71.00 | | 67.00 | 70.00 | **71.50** |
|
345 |
+
| Korean | **72.50** | | 67.25 | 67.25 | 70.25 |
|
346 |
+
| Portuguese | **76.25** | | 71.25 | 72.50 | 75.75 |
|
347 |
+
| Chinese | **75.50** | | 69.25 | 75.25 | 67.25 |
|
348 |
+
| Average | **75.13** | | 69.94 | 72.81 | 71.88 |
|
349 |
| **Multilingual Math (MGSM)** | | | | | |
|
350 |
+
| Spanish | **93.20** | | 91.60 | 86.40 | 74.00 |
|
351 |
+
| German | **89.60** | | 89.60 | 78.80 | 68.80 |
|
352 |
+
| French | **86.40** | | 86.00 | 78.80 | 70.80 |
|
353 |
| Chinese | 44.40 | | **75.20** | 28.80 | 26.80 |
|
354 |
+
| Japanese | **76.00** | | 74.80 | 30.80 | 26.40 |
|
355 |
| Russian | 90.40 | | **91.60** | 83.60 | 76.00 |
|
356 |
| Average | 80.00 | | **84.80** | 64.53 | 57.13 |
|
357 |
|
|
|
431 |
| Description of any methods implemented in data acquisition or processing, if any, to address illegal or harmful content in the training data, including, but not limited to, child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) | We used a Gemma-3 4B-based guard model trained on [Nemotron Content Safety Dataset v2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) for content safety to exclude potentially illegal or harmful content from the training. |
|
432 |
| Use Case Restrictions: | GA: Abide by the [NVIDIA Open Model License Agreement](https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.nvidia.com%2Fen-us%2Fagreements%2Fenterprise-software%2Fnvidia-open-model-license%2F&data=05%7C02%7Cysuhara%40nvidia.com%7C72ec0b4887a44a71c85808ddda01c000%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638906423286956339%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=45LwrIpNjVPgKSqFQ3p6e4B%2BoRQoGFoWQenWUhimPok%3D&reserved=0). |
|
433 |
| Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
|
434 |
+
| This AI model was developed based on our policies to ensure responsible data handling and risk mitigation. The datasets used for training have been scanned for harmful content and illegal content, consistent with our policies including scanning for Child Sexual Abuse Material (CSAM). Ongoing review and monitoring mechanisms are in place based on our policies and to maintain data integrity. | True. We use [Nemotron Content Safety Dataset V2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) and an internal safety dataset specialized for minority sexuality for content safety evaluation to ensure the safety of this model. |
|
435 |
+
|