Model Name SmolLM2-135M

Model Description

SmolLM2-135M is a 135M parameter model based on the Llama 3 architecture.
It is trained on the Cosmopedia-2 dataset.
Purpose of this model is to integrade DeepSeek Architecture Components Like MultiHeadLatentAttention and DeepSeekMixureOfExperts in SmolLm2 Architecture,
I trained model from scratch for 15 Hours using g5.2xlarge instance (24 GB A10 Single GPU)
trained steps 100000 (Batch config : Batch size 8, Effective Batch size 16, with 512 context length)

Base Tokenizer

Usage Example

import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from deepseek_v3 import DeepSeekV3Model
import yaml

# Download the model file
model_path = hf_hub_download(
    repo_id="crpatel/DeepSeek-V3-SmolLm2",
    filename="model.pt"
)

# Load configuration
config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader)

# Initialize model
model = DeepSeekV3Model(config['model'])
model.load_state_dict(torch.load(model_path, map_location='cpu'))

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")

# Encode input text
encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu')
print(encoded_text)

# Generate text
generated_text = model.generate(
    idx=encoded_text, 
    max_new_tokens=100, 
    context_length=50, 
    temperature=0.9,
    top_k=2, 
    eos_token=tokenizer.eos_token_id, 
    device='cpu'
)

# Decode and print the generated text
decoded_text = tokenizer.decode(generated_text.squeeze(0))
print(decoded_text)

crpatel
/

DeepSeek-V3-SmolLm2

Model Name SmolLM2-135M

Model Description

Base Tokenizer

Usage Example

Model tree for crpatel/DeepSeek-V3-SmolLm2

Dataset used to train crpatel/DeepSeek-V3-SmolLm2

Space using crpatel/DeepSeek-V3-SmolLm2 1