---
license: mit
datasets:
- HuggingFaceTB/smollm-corpus
language:
- en
base_model:
- HuggingFaceTB/SmolLM2-135M
library_name: transformers
---
# Model Name SmolLM2-135M

## Model Description
- [SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) is a 135M parameter model based on the Llama 3 architecture.
- It is trained on the [Cosmopedia-2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) dataset.
- Purpose of this model is to integrade DeepSeek Architecture Components Like MultiHeadLatentAttention and DeepSeekMixureOfExperts in SmolLm2 Architecture,
- I trained model from scratch for 15 Hours using g5.2xlarge instance (24 GB A10 Single GPU)
- trained steps 100000 (Batch config : Batch size 8, Effective Batch size 16, with 512 context length)
## Base Tokenizer
[Cosmo2-tokenizer](https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer)

## Usage Example
```python
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from deepseek_v3 import DeepSeekV3Model
import yaml

# Download the model file
model_path = hf_hub_download(
    repo_id="crpatel/DeepSeek-V3-SmolLm2",
    filename="model.pt"
)

# Load configuration
config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader)

# Initialize model
model = DeepSeekV3Model(config['model'])
model.load_state_dict(torch.load(model_path, map_location='cpu'))

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")

# Encode input text
encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu')
print(encoded_text)

# Generate text
generated_text = model.generate(
    idx=encoded_text, 
    max_new_tokens=100, 
    context_length=50, 
    temperature=0.9,
    top_k=2, 
    eos_token=tokenizer.eos_token_id, 
    device='cpu'
)

# Decode and print the generated text
decoded_text = tokenizer.decode(generated_text.squeeze(0))
print(decoded_text)
```