convaiinnovations commited on
Commit
70a712a
·
verified ·
1 Parent(s): 51e1b48

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -16,7 +16,7 @@ This repository contains a custom-trained Hindi Causal Language Model designed f
16
  ## Model Description
17
  - **Model Size:** 113M (YAH !!! Its very small)
18
 
19
- - **Architecture:** Custom Transformer (12 layers, hidden=768, 16 heads, ffn=3072, act=geglu, norm=rmsnorm) based on the `HindiCausalLM` class with Hindi-specific optimizations:
20
  - Multi-resolution attention to capture both character-level and word-level patterns
21
  - Morphology-aware feed-forward layers
22
  - Script-mix processing for Hindi-English code-mixing
@@ -32,7 +32,7 @@ This repository contains a custom-trained Hindi Causal Language Model designed f
32
  - IndicGLUE (30K samples)
33
  - Hindi literature (5K passages)
34
  - **Tokenizer:** SentencePiece trained on Hindi text with vocab size of 16,000
35
- - **Training Details:** Trained on 4xL4 24GB VRAM GPUs for 8 hours. 2 epochs, hidden size=768, num_layers=12, block_size=512, batch_size=64, learning_rate=5e-5, geglu activation, rope positional encoding, and rms normalization
36
 
37
  ## How to Use
38
 
 
16
  ## Model Description
17
  - **Model Size:** 113M (YAH !!! Its very small)
18
 
19
+ - **Architecture:** Custom Transformer (12 layers, hidden=768, 16 heads, ffn=3072, act=swiglu, norm=rmsnorm) based on the `HindiCausalLM` class with Hindi-specific optimizations:
20
  - Multi-resolution attention to capture both character-level and word-level patterns
21
  - Morphology-aware feed-forward layers
22
  - Script-mix processing for Hindi-English code-mixing
 
32
  - IndicGLUE (30K samples)
33
  - Hindi literature (5K passages)
34
  - **Tokenizer:** SentencePiece trained on Hindi text with vocab size of 16,000
35
+ - **Training Details:** Trained on 4xL4 24GB VRAM GPUs for 8 hours. 2 epochs, hidden size=768, num_layers=12, block_size=512, batch_size=64, learning_rate=5e-5, swiglu activation, rope positional encoding, and rms normalization
36
 
37
  ## How to Use
38