waghmareps12
/

SmolLM_125M

Text Generation

Model card Files Files and versions Community

SmolLM_125M / README.md

waghmareps12's picture

Rename model-card.md to README.md

f56fb39 verified 5 months ago

|

history blame contribute delete

2.5 kB

	---
	language: en
	tags:
	- pytorch
	- causal-lm
	- language-model
	- text-generation
	license: mit
	datasets:
	- wikitext
	metrics:
	- perplexity
	- loss
	library_name: pytorch
	pipeline_tag: text-generation

	model-index:
	- name: SmolLM-125M
	results:
	- task:
	type: text-generation
	name: Language Modeling
	dataset:
	type: wikitext
	name: WikiText-2
	metrics:
	- type: perplexity
	value: to_be_updated # Will be updated after training
	- type: loss
	value: to_be_updated # Will be updated after training
	---

	# SmolLM-125M: A Lightweight Language Model for Consumer Hardware

	This is a 125M parameter language model designed to be trained and run on consumer hardware with limited VRAM (4GB+). The model follows a GPT-style architecture but is optimized for efficiency and memory usage.

	## Model Details

	- Architecture: GPT-style Transformer
	- Parameters: 125M
	- Context Length: 512 tokens
	- Vocabulary: 50,257 tokens (GPT-2 tokenizer)
	- Training Data: WikiText-2
	- Hardware Requirements: 4GB+ VRAM GPU

	### Architecture Specifications
	- Layers: 12 transformer blocks
	- Attention Heads: 12
	- Embedding Dimension: 768
	- Activation: GELU
	- Layer Normalization: Pre-norm

	## Training Details

	- Hardware Used: GTX 1650 (4GB VRAM)
	- Training Time: ~4 hours
	- Batch Size: 4 (16 with gradient accumulation)
	- Learning Rate: 3e-4 with cosine decay
	- Weight Decay: 0.1
	- Optimizer: AdamW

	### Memory Optimizations
	1. Length-based batch scheduling
	2. Gradient accumulation (4 steps)
	3. Dynamic batch scheduling
	4. Pre-padded sequences

	## Usage

	```python
	from transformers import AutoTokenizer
	from model import SmallLanguageModel, ModelConfig

	# Initialize model
	config = ModelConfig(
	vocab_size=50257,
	block_size=512,
	n_layer=12,
	n_head=12,
	n_embd=768,
	dropout=0.1,
	bias=True
	)
	model = SmallLanguageModel(config)

	# Generate text
	tokenizer = AutoTokenizer.from_pretrained("gpt2")
	input_text = "Once upon a time"
	input_ids = tokenizer(input_text, return_tensors="pt").input_ids
	output_ids = model.generate(input_ids, max_length=100)
	generated_text = tokenizer.decode(output_ids[0])
	```

	## Limitations

	- Limited context window (512 tokens)
	- Smaller capacity compared to larger models
	- Training data limited to WikiText-2

	## License
	This model is released under the MIT License.