Where is the SmolLM3-1B and 2B and 0.6B?

#19
by ysn-rfd - opened

Where??

hmm..........

Hugging Face Smol Models Research org

Stay tuned :)

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

Hugging Face Smol Models Research org

I'm curious, do you see some cool use case for 75M parameter model?

eliebak said:

I'm curious, do you see some cool use case for 75M parameter model?

Yes, I want to run even on a potato PC like this one (Not very potato like one that runs Windows 95 but not like a PC with a RTX 5090 for example)

Hugging Face Smol Models Research org
edited Jul 20

@MihaiPopa-1 I think Elie was asking about what specific tasks such a small model be able to perform at a sufficiently high accuracy to be useful (e.g., spell-checker, translation, summarization)... and how would you use the model :)

Stay tuned :)

OK, Hoping for better and brighter days for SmolLM, SmolLM = Elegance, Power, Performance, best model for low-end device.

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

Yes, it’s possible — using the Knowledge Distillation method.

eliebak said:

I'm curious, do you see some cool use case for 75M parameter model?

Yes, I want to run even on a potato PC like this one (Not very potato like one that runs Windows 95 but not like a PC with a RTX 5090 for example)

You might not believe it, but even models with either a large or small number of parameters can be run on integrated graphics. You don't necessarily need a dedicated GPU. Just make sure the shared memory for the integrated graphics is at least 4GB.

I use Intel HD Graphics 520 myself — it performs great, especially with MoE models.

I love neural networks with all my heart — they’re super cool, awesome, and fascinating!

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

Yes, it’s possible — using the Knowledge Distillation method.

TRL library have GKDConfig, GKDTrainer so is higly possible

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

Yes, it’s possible — using the Knowledge Distillation method.

TRL library have GKDConfig, GKDTrainer so is higly possible

Yeah, thanks

Sign up or log in to comment