Update README.md
Browse files
README.md
CHANGED
@@ -42,30 +42,26 @@ When benchmarked against leading models like Gemma-2B, LLaMA-3.2-3B, and Sarvam-
|
|
42 |
<img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfOWAfktE9_XdRl7UY-8tCBaY1n-myJb9UQvIKBnsagD3hBpOu28fi5LGupKjM6o-CxvozuPpGYATk0aRBDFNADwAfy8uB4S1M9SPycWDDf1VmV5Co9KPXR1_FMMAFV54DkB6uO?key=Z4vPtKGJIGf83PmLrJX9RY3I">
|
43 |
</div>
|
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
### Training hyperparameters
|
46 |
|
47 |
|
48 |
The following hyperparameters were used during training:
|
49 |
- learning_rate: 1e-05
|
50 |
-
- train_batch_size: 8
|
51 |
-
- eval_batch_size: 8
|
52 |
- seed: 42
|
53 |
- distributed_type: multi-GPU
|
54 |
-
- num_devices: 7
|
55 |
-
- total_train_batch_size: 56
|
56 |
-
- total_eval_batch_size: 56
|
57 |
-
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
58 |
- lr_scheduler_type: cosine
|
59 |
- lr_scheduler_warmup_ratio: 0.1
|
60 |
- num_epochs: 3.0
|
61 |
|
62 |
-
### Training results - Multilingual Task Performance Comparison
|
63 |
-
|
64 |
-
| Language | Hellaswag | ARC-c | ARC-e | MMLU | BoolQ |
|
65 |
-
|------------|-----------|--------|--------|--------|--------|
|
66 |
-
| Hindi | 47.85 | 36.68 | 52.14 | 46.73 | 57.61 |
|
67 |
-
| Tamil | 49.45 | 38.65 | 53.45 | 44.71 | 45.87 |
|
68 |
-
| Telugu | 50.84 | 37.96 | 53.36 | 46.85 | 51.89 |
|
69 |
-
| Kannada | 52.16 | 38.31 | 53.11 | 46.38 | 52.32 |
|
70 |
-
| Malayalam | 46.32 | 29.60 | 40.86 | 43.63 | 46.69 |
|
71 |
|
|
|
42 |
<img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfOWAfktE9_XdRl7UY-8tCBaY1n-myJb9UQvIKBnsagD3hBpOu28fi5LGupKjM6o-CxvozuPpGYATk0aRBDFNADwAfy8uB4S1M9SPycWDDf1VmV5Co9KPXR1_FMMAFV54DkB6uO?key=Z4vPtKGJIGf83PmLrJX9RY3I">
|
43 |
</div>
|
44 |
|
45 |
+
|
46 |
+
### Training results - Multilingual Task Performance Comparison
|
47 |
+
|
48 |
+
| Language | Hellaswag | ARC-c | ARC-e | MMLU | BoolQ |
|
49 |
+
|------------|-----------|--------|--------|--------|--------|
|
50 |
+
| Hindi | 47.85 | 36.68 | 52.14 | 46.73 | 57.61 |
|
51 |
+
| Tamil | 49.45 | 38.65 | 53.45 | 44.71 | 45.87 |
|
52 |
+
| Telugu | 50.84 | 37.96 | 53.36 | 46.85 | 51.89 |
|
53 |
+
| Kannada | 52.16 | 38.31 | 53.11 | 46.38 | 52.32 |
|
54 |
+
| Malayalam | 46.32 | 29.60 | 40.86 | 43.63 | 46.69 |
|
55 |
+
|
56 |
### Training hyperparameters
|
57 |
|
58 |
|
59 |
The following hyperparameters were used during training:
|
60 |
- learning_rate: 1e-05
|
|
|
|
|
61 |
- seed: 42
|
62 |
- distributed_type: multi-GPU
|
|
|
|
|
|
|
|
|
63 |
- lr_scheduler_type: cosine
|
64 |
- lr_scheduler_warmup_ratio: 0.1
|
65 |
- num_epochs: 3.0
|
66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|