Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ This repository contains a custom-trained Hindi Causal Language Model designed f
|
|
16 |
## Model Description
|
17 |
- **Model Size:** 113M (YAH !!! Its very small)
|
18 |
|
19 |
-
- **Architecture:** Custom Transformer (12 layers, hidden=768, 16 heads, ffn=3072, act=
|
20 |
- Multi-resolution attention to capture both character-level and word-level patterns
|
21 |
- Morphology-aware feed-forward layers
|
22 |
- Script-mix processing for Hindi-English code-mixing
|
@@ -32,7 +32,7 @@ This repository contains a custom-trained Hindi Causal Language Model designed f
|
|
32 |
- IndicGLUE (30K samples)
|
33 |
- Hindi literature (5K passages)
|
34 |
- **Tokenizer:** SentencePiece trained on Hindi text with vocab size of 16,000
|
35 |
-
- **Training Details:** Trained on 4xL4 24GB VRAM GPUs for 8 hours. 2 epochs, hidden size=768, num_layers=12, block_size=512, batch_size=64, learning_rate=5e-5,
|
36 |
|
37 |
## How to Use
|
38 |
|
|
|
16 |
## Model Description
|
17 |
- **Model Size:** 113M (YAH !!! Its very small)
|
18 |
|
19 |
+
- **Architecture:** Custom Transformer (12 layers, hidden=768, 16 heads, ffn=3072, act=swiglu, norm=rmsnorm) based on the `HindiCausalLM` class with Hindi-specific optimizations:
|
20 |
- Multi-resolution attention to capture both character-level and word-level patterns
|
21 |
- Morphology-aware feed-forward layers
|
22 |
- Script-mix processing for Hindi-English code-mixing
|
|
|
32 |
- IndicGLUE (30K samples)
|
33 |
- Hindi literature (5K passages)
|
34 |
- **Tokenizer:** SentencePiece trained on Hindi text with vocab size of 16,000
|
35 |
+
- **Training Details:** Trained on 4xL4 24GB VRAM GPUs for 8 hours. 2 epochs, hidden size=768, num_layers=12, block_size=512, batch_size=64, learning_rate=5e-5, swiglu activation, rope positional encoding, and rms normalization
|
36 |
|
37 |
## How to Use
|
38 |
|