Where is the SmolLM3-1B and 2B and 0.6B?
Where??
hmm..........
Stay tuned :)
Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.
I'm curious, do you see some cool use case for 75M parameter model?
eliebak said:
I'm curious, do you see some cool use case for 75M parameter model?
Yes, I want to run even on a potato PC like this one (Not very potato like one that runs Windows 95 but not like a PC with a RTX 5090 for example)
@MihaiPopa-1 I think Elie was asking about what specific tasks such a small model be able to perform at a sufficiently high accuracy to be useful (e.g., spell-checker, translation, summarization)... and how would you use the model :)
Stay tuned :)
OK, Hoping for better and brighter days for SmolLM, SmolLM = Elegance, Power, Performance, best model for low-end device.
Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.
Yes, it’s possible — using the Knowledge Distillation method.
eliebak said:
I'm curious, do you see some cool use case for 75M parameter model?
Yes, I want to run even on a potato PC like this one (Not very potato like one that runs Windows 95 but not like a PC with a RTX 5090 for example)
You might not believe it, but even models with either a large or small number of parameters can be run on integrated graphics. You don't necessarily need a dedicated GPU. Just make sure the shared memory for the integrated graphics is at least 4GB.
I use Intel HD Graphics 520 myself — it performs great, especially with MoE models.
I love neural networks with all my heart — they’re super cool, awesome, and fascinating!
Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.
Yes, it’s possible — using the Knowledge Distillation method.
TRL library have GKDConfig, GKDTrainer so is higly possible
Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.
Yes, it’s possible — using the Knowledge Distillation method.
TRL library have GKDConfig, GKDTrainer so is higly possible
Yeah, thanks