metadata
license: mit
language:
- hi
- en
pipeline_tag: translation
tags:
- SVECTOR
- LLM
- Hindi
- India
Akshara-2B-Hindi Language Model
Overview
Akshara-2B-Hindi is a powerful language model optimized for Hindi and English text processing. This 2 billion parameter model leverages an advanced architecture to provide state-of-the-art performance in natural language understanding and generation tasks.
Model Architecture
Core Specifications
- Base Architecture: AksharaForCausalLM
- Model Type: Causal Language Model (akshara)
- Hidden Size: 2048
- Number of Layers: 18
- Attention Heads: 8
- Key-Value Heads: 1
- Intermediate Size: 16384
- Head Dimension: 256
- Vocabulary Size: 256,000 tokens
- Maximum Sequence Length: 8,192 tokens
- Parameters: ~2 billion
Technical Details
- Attention Mechanism:
- Attention Bias: Disabled
- Attention Dropout: 0.0
- RMS Norm Epsilon: 1e-06
- Initializer Range: 0.02
- RoPE Theta: 10000.0
- RoPE Scaling: None
Special Tokens
- BOS Token ID: 2
- EOS Token ID: 1
- PAD Token ID: 0
Implementation Details
- Activation Function: GELU
- Model Dtype: float16
- Cache Usage: Enabled
- Transformers Version: 4.38.1
Usage
Installation
pip install transformers
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "SVECTOR-CORPORATION/Akshara-2B-Hindi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
Basic Text Generation
text = "आज का मौसम" # Example Hindi text: "Today's weather"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_length=100,
temperature=0.7,
do_sample=True
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
Features
- Bilingual Capabilities: Optimized for both Hindi and English text processing
- Long Context Window: Supports sequences up to 8,192 tokens
- Efficient Architecture: Uses a single key-value head for attention, optimizing computational efficiency
- Large Vocabulary: 256,000 token vocabulary supporting diverse Hindi and English text
Performance Considerations
- Model weights are stored in float16 format for efficient memory usage
- Attention computation is optimized with disabled bias terms
- Supports caching for improved inference speed
- Uses RMSNorm with epsilon of 1e-06 for stable training
Limitations
- Maximum context length of 8,192 tokens
- Primarily optimized for Hindi and English languages
- float16 precision may affect numerical stability in some cases
Citation
If you use this model in your research, please cite:
@misc{akshara2b2024,
title={Akshara-2B-Hindi: A Bilingual Language Model},
author={SVECTOR CORPORATION},
year={2025},
}
Akshara
License
MIT License
Copyright (c) 2025 SVECTOR CORPORATION