Model Name SmolLM2-135M
Model Description
- SmolLM2-135M is a 135M parameter model based on the Llama 3 architecture.
- It is trained on the Cosmopedia-2 dataset.
- Purpose of this model is to integrade DeepSeek Architecture Components Like MultiHeadLatentAttention and DeepSeekMixureOfExperts in SmolLm2 Architecture,
- I trained model from scratch for 15 Hours using g5.2xlarge instance (24 GB A10 Single GPU)
- trained steps 100000 (Batch config : Batch size 8, Effective Batch size 16, with 512 context length)
Base Tokenizer
Usage Example
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from deepseek_v3 import DeepSeekV3Model
import yaml
# Download the model file
model_path = hf_hub_download(
repo_id="crpatel/DeepSeek-V3-SmolLm2",
filename="model.pt"
)
# Load configuration
config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader)
# Initialize model
model = DeepSeekV3Model(config['model'])
model.load_state_dict(torch.load(model_path, map_location='cpu'))
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")
# Encode input text
encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu')
print(encoded_text)
# Generate text
generated_text = model.generate(
idx=encoded_text,
max_new_tokens=100,
context_length=50,
temperature=0.9,
top_k=2,
eos_token=tokenizer.eos_token_id,
device='cpu'
)
# Decode and print the generated text
decoded_text = tokenizer.decode(generated_text.squeeze(0))
print(decoded_text)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for crpatel/DeepSeek-V3-SmolLm2
Base model
HuggingFaceTB/SmolLM2-135M