convaiinnovations
/

hindi-foundational-model-base

Text Generation

hindi_causal_lm

Model card Files Files and versions

convaiinnovations commited on Apr 25

Commit

70a712a

·

verified ·

1 Parent(s): 51e1b48

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ This repository contains a custom-trained Hindi Causal Language Model designed f
 ## Model Description
 - **Model Size:** 113M (YAH !!! Its very small)
-- **Architecture:** Custom Transformer (12 layers, hidden=768, 16 heads, ffn=3072, act=geglu, norm=rmsnorm) based on the `HindiCausalLM` class with Hindi-specific optimizations:
   - Multi-resolution attention to capture both character-level and word-level patterns
   - Morphology-aware feed-forward layers
   - Script-mix processing for Hindi-English code-mixing
@@ -32,7 +32,7 @@ This repository contains a custom-trained Hindi Causal Language Model designed f
   - IndicGLUE (30K samples)
   - Hindi literature (5K passages)
 - **Tokenizer:** SentencePiece trained on Hindi text with vocab size of 16,000
-- **Training Details:** Trained on 4xL4 24GB VRAM GPUs for 8 hours. 2 epochs, hidden size=768, num_layers=12, block_size=512, batch_size=64, learning_rate=5e-5, geglu activation, rope positional encoding, and rms normalization
 ## How to Use

 ## Model Description
 - **Model Size:** 113M (YAH !!! Its very small)
+- **Architecture:** Custom Transformer (12 layers, hidden=768, 16 heads, ffn=3072, act=swiglu, norm=rmsnorm) based on the `HindiCausalLM` class with Hindi-specific optimizations:
   - Multi-resolution attention to capture both character-level and word-level patterns
   - Morphology-aware feed-forward layers
   - Script-mix processing for Hindi-English code-mixing
   - IndicGLUE (30K samples)
   - Hindi literature (5K passages)
 - **Tokenizer:** SentencePiece trained on Hindi text with vocab size of 16,000
+- **Training Details:** Trained on 4xL4 24GB VRAM GPUs for 8 hours. 2 epochs, hidden size=768, num_layers=12, block_size=512, batch_size=64, learning_rate=5e-5, swiglu activation, rope positional encoding, and rms normalization
 ## How to Use