dittops commited on
Commit
62de7e6
·
verified ·
1 Parent(s): 6ad379a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -15
README.md CHANGED
@@ -42,30 +42,26 @@ When benchmarked against leading models like Gemma-2B, LLaMA-3.2-3B, and Sarvam-
42
  <img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfOWAfktE9_XdRl7UY-8tCBaY1n-myJb9UQvIKBnsagD3hBpOu28fi5LGupKjM6o-CxvozuPpGYATk0aRBDFNADwAfy8uB4S1M9SPycWDDf1VmV5Co9KPXR1_FMMAFV54DkB6uO?key=Z4vPtKGJIGf83PmLrJX9RY3I">
43
  </div>
44
 
 
 
 
 
 
 
 
 
 
 
 
45
  ### Training hyperparameters
46
 
47
 
48
  The following hyperparameters were used during training:
49
  - learning_rate: 1e-05
50
- - train_batch_size: 8
51
- - eval_batch_size: 8
52
  - seed: 42
53
  - distributed_type: multi-GPU
54
- - num_devices: 7
55
- - total_train_batch_size: 56
56
- - total_eval_batch_size: 56
57
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
58
  - lr_scheduler_type: cosine
59
  - lr_scheduler_warmup_ratio: 0.1
60
  - num_epochs: 3.0
61
 
62
- ### Training results - Multilingual Task Performance Comparison
63
-
64
- | Language | Hellaswag | ARC-c | ARC-e | MMLU | BoolQ |
65
- |------------|-----------|--------|--------|--------|--------|
66
- | Hindi | 47.85 | 36.68 | 52.14 | 46.73 | 57.61 |
67
- | Tamil | 49.45 | 38.65 | 53.45 | 44.71 | 45.87 |
68
- | Telugu | 50.84 | 37.96 | 53.36 | 46.85 | 51.89 |
69
- | Kannada | 52.16 | 38.31 | 53.11 | 46.38 | 52.32 |
70
- | Malayalam | 46.32 | 29.60 | 40.86 | 43.63 | 46.69 |
71
 
 
42
  <img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfOWAfktE9_XdRl7UY-8tCBaY1n-myJb9UQvIKBnsagD3hBpOu28fi5LGupKjM6o-CxvozuPpGYATk0aRBDFNADwAfy8uB4S1M9SPycWDDf1VmV5Co9KPXR1_FMMAFV54DkB6uO?key=Z4vPtKGJIGf83PmLrJX9RY3I">
43
  </div>
44
 
45
+
46
+ ### Training results - Multilingual Task Performance Comparison
47
+
48
+ | Language | Hellaswag | ARC-c | ARC-e | MMLU | BoolQ |
49
+ |------------|-----------|--------|--------|--------|--------|
50
+ | Hindi | 47.85 | 36.68 | 52.14 | 46.73 | 57.61 |
51
+ | Tamil | 49.45 | 38.65 | 53.45 | 44.71 | 45.87 |
52
+ | Telugu | 50.84 | 37.96 | 53.36 | 46.85 | 51.89 |
53
+ | Kannada | 52.16 | 38.31 | 53.11 | 46.38 | 52.32 |
54
+ | Malayalam | 46.32 | 29.60 | 40.86 | 43.63 | 46.69 |
55
+
56
  ### Training hyperparameters
57
 
58
 
59
  The following hyperparameters were used during training:
60
  - learning_rate: 1e-05
 
 
61
  - seed: 42
62
  - distributed_type: multi-GPU
 
 
 
 
63
  - lr_scheduler_type: cosine
64
  - lr_scheduler_warmup_ratio: 0.1
65
  - num_epochs: 3.0
66
 
 
 
 
 
 
 
 
 
 
67