jekunz
/

smollm-135m-cpt-fineweb-swedish

Text Generation

Model card Files Files and versions Community

jekunz commited on Jan 31

Commit

f00d369

·

verified ·

1 Parent(s): a82a756

Create README.md

Files changed (1) hide show

README.md +22 -0

README.md ADDED Viewed

	@@ -0,0 +1,22 @@

+---
+license: apache-2.0
+datasets:
+- HuggingFaceFW/fineweb-2
+language:
+- sv
+base_model:
+- HuggingFaceTB/SmolLM2-135M-Instruct
+pipeline_tag: text-generation
+---
+This is a SmolLM2-135M-Instruct model fine-tuned on the Swedish portion of Fineweb-2. It is intended for my research and has not been evaluated more broadly yet.
+Training:
+- 1 Epoch
+- Learning rate: 5e-4
+- LR scheduler: Cosine
+- Warmup ratio: 0.05
+- Batch size: 1
+- 4 A100 (40GB) GPUs
+- Gradient accumulation steps: 64
+- Effective batch size: 256
+- Max. context length: 8192 tokens