Where is the SmolLM3-1B and 2B and 0.6B?

#19

by ysn-rfd - opened Jul 15

Discussion

ysn-rfd

Jul 15

Where??

hmm..........

eliebak

Hugging Face Smol Models Research org Jul 16

Stay tuned :)

MihaiPopa-1

Jul 17

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

eliebak

Hugging Face Smol Models Research org Jul 18

I'm curious, do you see some cool use case for 75M parameter model?

MihaiPopa-1

Jul 19

eliebak said:

I'm curious, do you see some cool use case for 75M parameter model?

Yes, I want to run even on a potato PC like this one (Not very potato like one that runs Windows 95 but not like a PC with a RTX 5090 for example)

Xenova

Hugging Face Smol Models Research org Jul 20

•

edited Jul 20

@MihaiPopa-1 I think Elie was asking about what specific tasks such a small model be able to perform at a sufficiently high accuracy to be useful (e.g., spell-checker, translation, summarization)... and how would you use the model :)

ysn-rfd

Jul 20

Stay tuned :)

OK, Hoping for better and brighter days for SmolLM, SmolLM = Elegance, Power, Performance, best model for low-end device.

ysn-rfd

Jul 20

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

Yes, it’s possible — using the Knowledge Distillation method.

ysn-rfd

Jul 20

•

edited Jul 20

eliebak said:

I'm curious, do you see some cool use case for 75M parameter model?

Yes, I want to run even on a potato PC like this one (Not very potato like one that runs Windows 95 but not like a PC with a RTX 5090 for example)

You might not believe it, but even models with either a large or small number of parameters can be run on integrated graphics. You don't necessarily need a dedicated GPU. Just make sure the shared memory for the integrated graphics is at least 4GB.

I use Intel HD Graphics 520 myself — it performs great, especially with MoE models.

ysn-rfd

Jul 20

I love neural networks with all my heart — they’re super cool, awesome, and fascinating!

Clausss

Jul 20

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

Yes, it’s possible — using the Knowledge Distillation method.

TRL library have GKDConfig, GKDTrainer so is higly possible

ysn-rfd

Jul 20

•

edited Jul 20

Can you also train SmolLM3 1.5B, 0.4B, 0.25B and 120M, even 75M? It will be better than SmolLM2, while being also smaller.

Yes, it’s possible — using the Knowledge Distillation method.

TRL library have GKDConfig, GKDTrainer so is higly possible

Yeah, thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment