hex-1 / README.md
dittops's picture
Update README.md
05724aa verified
|
raw
history blame
1.84 kB
metadata
library_name: transformers
license: apache-2.0
base_model: Qwen/Qwen3-4B-Base
tags:
  - llm
  - indic
model-index:
  - name: Hex-1
    results: []
language:
  - hi
  - te
  - ta
  - ml
  - kn

Hex-1

Hex-1 is a 4-billion parameter language model specifically optimized for Indian languages. It supports five major Indian languages, including Hindi, Kannada, Telugu, Tamil and Malayalam. When benchmarked against leading models like Gemma-2B, LLaMA-3.2-3B, and Sarvam-1, Hex1 delivers best-in-class performance in all five supported languages on MMLU dataset.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 7
  • total_train_batch_size: 56
  • total_eval_batch_size: 56
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results - Multilingual Task Performance Comparison

Language Hellaswag ARC-c ARC-e MMLU BoolQ
Hindi 47.85 36.68 52.14 46.73 57.61
Tamil 49.45 38.65 53.45 44.71 45.87
Telugu 50.84 37.96 53.36 46.85 51.89
Kannada 52.16 38.31 53.11 46.38 52.32
Malayalam 46.32 29.60 40.86 43.63 46.69

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.7.0+cu126
  • Datasets 3.5.0
  • Tokenizers 0.21.1