--- license: mit language: - hi - en pipeline_tag: translation tags: - SVECTOR - LLM - Hindi - India --- # Akshara-2B-Hindi Language Model ## Overview Akshara-2B-Hindi is a powerful language model optimized for Hindi and English text processing. This 2 billion parameter model leverages an advanced architecture to provide state-of-the-art performance in natural language understanding and generation tasks. ## Model Architecture ### Core Specifications - **Base Architecture**: AksharaForCausalLM - **Model Type**: Causal Language Model (akshara) - **Hidden Size**: 2048 - **Number of Layers**: 18 - **Attention Heads**: 8 - **Key-Value Heads**: 1 - **Intermediate Size**: 16384 - **Head Dimension**: 256 - **Vocabulary Size**: 256,000 tokens - **Maximum Sequence Length**: 8,192 tokens - **Parameters**: ~2 billion ### Technical Details - **Attention Mechanism**: - Attention Bias: Disabled - Attention Dropout: 0.0 - RMS Norm Epsilon: 1e-06 - Initializer Range: 0.02 - RoPE Theta: 10000.0 - RoPE Scaling: None ### Special Tokens - **BOS Token ID**: 2 - **EOS Token ID**: 1 - **PAD Token ID**: 0 ### Implementation Details - **Activation Function**: GELU - **Model Dtype**: float16 - **Cache Usage**: Enabled - **Transformers Version**: 4.38.1 ## Usage ### Installation ```bash pip install transformers ``` ### Loading the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "SVECTOR-CORPORATION/Akshara-2B-Hindi" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) ``` ### Basic Text Generation ```python text = "आज का मौसम" # Example Hindi text: "Today's weather" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate( inputs.input_ids, max_length=100, temperature=0.7, do_sample=True ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Features - **Bilingual Capabilities**: Optimized for both Hindi and English text processing - **Long Context Window**: Supports sequences up to 8,192 tokens - **Efficient Architecture**: Uses a single key-value head for attention, optimizing computational efficiency - **Large Vocabulary**: 256,000 token vocabulary supporting diverse Hindi and English text ## Performance Considerations - Model weights are stored in float16 format for efficient memory usage - Attention computation is optimized with disabled bias terms - Supports caching for improved inference speed - Uses RMSNorm with epsilon of 1e-06 for stable training ## Limitations - Maximum context length of 8,192 tokens - Primarily optimized for Hindi and English languages - float16 precision may affect numerical stability in some cases ## Citation If you use this model in your research, please cite: ``` @misc{akshara2b2024, title={Akshara-2B-Hindi: A Bilingual Language Model}, author={SVECTOR CORPORATION}, year={2025}, } ``` ## [Akshara](https://www.svector.co.in/akshara) ## License MIT License Copyright (c) 2025 SVECTOR CORPORATION