--- license: mit datasets: - HuggingFaceTB/smollm-corpus language: - en base_model: - HuggingFaceTB/SmolLM2-135M library_name: transformers --- # Model Name SmolLM2-135M ## Model Description - [SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M) is a 135M parameter model based on the Llama 3 architecture. - It is trained on the [Cosmopedia-2](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) dataset. - Purpose of this model is to integrade DeepSeek Architecture Components Like MultiHeadLatentAttention and DeepSeekMixureOfExperts in SmolLm2 Architecture, - I trained model from scratch for 15 Hours using g5.2xlarge instance (24 GB A10 Single GPU) - trained steps 100000 (Batch config : Batch size 8, Effective Batch size 16, with 512 context length) ## Base Tokenizer [Cosmo2-tokenizer](https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer) ## Usage Example ```python import torch from transformers import AutoTokenizer from huggingface_hub import hf_hub_download from deepseek_v3 import DeepSeekV3Model import yaml # Download the model file model_path = hf_hub_download( repo_id="crpatel/DeepSeek-V3-SmolLm2", filename="model.pt" ) # Load configuration config = yaml.load(open('config_smollm2_135M.yaml', "r"), Loader=yaml.FullLoader) # Initialize model model = DeepSeekV3Model(config['model']) model.load_state_dict(torch.load(model_path, map_location='cpu')) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/cosmo2-tokenizer") # Encode input text encoded_text = tokenizer.encode('Once Upon time ', return_tensors="pt").to('cpu') print(encoded_text) # Generate text generated_text = model.generate( idx=encoded_text, max_new_tokens=100, context_length=50, temperature=0.9, top_k=2, eos_token=tokenizer.eos_token_id, device='cpu' ) # Decode and print the generated text decoded_text = tokenizer.decode(generated_text.squeeze(0)) print(decoded_text) ```