Transformers
English

Model Name SmolLM2-135M

Model Description

  • SmolLM2-135M is a 135M parameter model based on the Llama 3 architecture.
  • It is trained on the Cosmopedia-2 dataset.
  • Purpose of this model is to integrade DeepSeek Architecture Components Like MultiHeadLatentAttention and DeepSeekMixureOfExperts in SmolLm2 Architecture,
  • I trained model from scratch for 15 Hours using g5.2xlarge instance (24 GB A10 Single GPU)
  • trained steps 100000 (Batch config : Batch size 8, Effective Batch size 16, with 512 context length)

Base Tokenizer

Cosmo2-tokenizer

Usage Example

import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from deepseek_v3 import DeepSeekV3Model
import yaml

# Download the model file
model_path = hf_hub_download(
    repo_id="crpatel/DeepSeek-V3-SmolLm2",
    filename="model.pt"
)

# Load configuration
config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader)

# Initialize model
model = DeepSeekV3Model(config['model'])
model.load_state_dict(torch.load(model_path, map_location='cpu'))

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer")

# Encode input text
encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu')
print(encoded_text)

# Generate text
generated_text = model.generate(
    idx=encoded_text, 
    max_new_tokens=100, 
    context_length=50, 
    temperature=0.9,
    top_k=2, 
    eos_token=tokenizer.eos_token_id, 
    device='cpu'
)

# Decode and print the generated text
decoded_text = tokenizer.decode(generated_text.squeeze(0))
print(decoded_text)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for crpatel/DeepSeek-V3-SmolLm2

Finetuned
(560)
this model

Dataset used to train crpatel/DeepSeek-V3-SmolLm2

Space using crpatel/DeepSeek-V3-SmolLm2 1