---
license: mit
language:
- hi
- en
pipeline_tag: translation
tags:
- SVECTOR
- LLM
- Hindi
- India
---
# Akshara-2B-Hindi Language Model

## Overview
Akshara-2B-Hindi is a powerful language model optimized for Hindi and English text processing. This 2 billion parameter model leverages an advanced architecture to provide state-of-the-art performance in natural language understanding and generation tasks.

## Model Architecture

### Core Specifications
- **Base Architecture**: AksharaForCausalLM
- **Model Type**: Causal Language Model (akshara)
- **Hidden Size**: 2048
- **Number of Layers**: 18
- **Attention Heads**: 8
- **Key-Value Heads**: 1
- **Intermediate Size**: 16384
- **Head Dimension**: 256
- **Vocabulary Size**: 256,000 tokens
- **Maximum Sequence Length**: 8,192 tokens
- **Parameters**: ~2 billion

### Technical Details
- **Attention Mechanism**:
  - Attention Bias: Disabled
  - Attention Dropout: 0.0
  - RMS Norm Epsilon: 1e-06
  - Initializer Range: 0.02
  - RoPE Theta: 10000.0
  - RoPE Scaling: None

### Special Tokens
- **BOS Token ID**: 2
- **EOS Token ID**: 1
- **PAD Token ID**: 0

### Implementation Details
- **Activation Function**: GELU
- **Model Dtype**: float16
- **Cache Usage**: Enabled
- **Transformers Version**: 4.38.1

## Usage

### Installation
```bash
pip install transformers
```

### Loading the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SVECTOR-CORPORATION/Akshara-2B-Hindi"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
```

### Basic Text Generation
```python
text = "आज का मौसम"  # Example Hindi text: "Today's weather"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
    inputs.input_ids,
    max_length=100,
    temperature=0.7,
    do_sample=True
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Features

- **Bilingual Capabilities**: Optimized for both Hindi and English text processing
- **Long Context Window**: Supports sequences up to 8,192 tokens
- **Efficient Architecture**: Uses a single key-value head for attention, optimizing computational efficiency
- **Large Vocabulary**: 256,000 token vocabulary supporting diverse Hindi and English text

## Performance Considerations

- Model weights are stored in float16 format for efficient memory usage
- Attention computation is optimized with disabled bias terms
- Supports caching for improved inference speed
- Uses RMSNorm with epsilon of 1e-06 for stable training

## Limitations

- Maximum context length of 8,192 tokens
- Primarily optimized for Hindi and English languages
- float16 precision may affect numerical stability in some cases

## Citation

If you use this model in your research, please cite:
```
@misc{akshara2b2024,
  title={Akshara-2B-Hindi: A Bilingual Language Model},
  author={SVECTOR CORPORATION},
  year={2025},
}
```
## [Akshara](https://www.svector.co.in/akshara)

## License

MIT License

Copyright (c) 2025 SVECTOR CORPORATION