surely, there's a lot of room for optimization, in terms of speed, but it's working and it's streaming the 18GB nicely :)
PZ
philipp-zettl
AI & ML interests
NLP/CV/Multimodal learning
Recent Activity
upvoted a changelog 2 days ago
Introducing Kernels upvoted a collection 6 days ago
deployed-models repliedto their post 8 days ago
I've been cooking something neat over the past weeks 👨🍳
We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.
The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.
To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯
But I can write rust and like building ML libraries.
So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?
The answers are wild, trust me!
Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)
The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"
```
----
Currently available: 1112735744, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6744 MB / 7805 MB
Data on GPU: 1641 MB
Grads on GPU: 3459 MB
CPU Offloaded: 18230 MB
---------------------------------
Currently available: 1079181312, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6776 MB / 7805 MB
Data on GPU: 1561 MB
Grads on GPU: 3279 MB
CPU Offloaded: 18590 MB
-----------------------------
```
Final models get exported in `safetensors` format and are compatible with PyTorch and `transformers`, for accessibility.
- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memoryOrganizations
replied to their post 8 days ago
posted an update 8 days ago
Post
2557
I've been cooking something neat over the past weeks 👨🍳
We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.
The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.
To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯
But I can write rust and like building ML libraries.
So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?
The answers are wild, trust me!
Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)
The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"
Final models get exported in
- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory
We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.
The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.
To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯
But I can write rust and like building ML libraries.
So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?
The answers are wild, trust me!
Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)
The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"
----
Currently available: 1112735744, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6744 MB / 7805 MB
Data on GPU: 1641 MB
Grads on GPU: 3459 MB
CPU Offloaded: 18230 MB
---------------------------------
Currently available: 1079181312, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6776 MB / 7805 MB
Data on GPU: 1561 MB
Grads on GPU: 3279 MB
CPU Offloaded: 18590 MB
-----------------------------Final models get exported in
safetensors format and are compatible with PyTorch and transformers, for accessibility.- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory
reacted to qgallouedec's post with 🔥 15 days ago
Post
2301
TRL v1.0 is out!
Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.
The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.
What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.
Read more: hf.co/blog/trl-v1
Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.
The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.
What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.
pip install --upgrade trlRead more: hf.co/blog/trl-v1
reacted to danielhanchen's post with 🔥 17 days ago
Post
2697
A new way to use Unsloth.
Coming soon...
Coming soon...
posted an update 17 days ago
Post
198
I'm unemployed, I have a gaming GPU, and I just published a German LLM.
qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLäMmlein paper (ACL 2025).
HellaSwag-DE: 0.3111 → 0.3193 ✅
ARC-DE: 0.2352 → 0.2575 ✅
MMlu-DE: 0.3600 → 0.2475 🔻 (alignment tax - known trade-off)
Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.
Weights, LoRA adapter, full training script and logs all public.
philipp-zettl/qwen3-0.6b-german
It ain't much, but it's honest work.
qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLäMmlein paper (ACL 2025).
HellaSwag-DE: 0.3111 → 0.3193 ✅
ARC-DE: 0.2352 → 0.2575 ✅
MMlu-DE: 0.3600 → 0.2475 🔻 (alignment tax - known trade-off)
Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.
Weights, LoRA adapter, full training script and logs all public.
philipp-zettl/qwen3-0.6b-german
It ain't much, but it's honest work.
reacted to SeaWolf-AI's post with 🔥 about 1 month ago
Post
11126
🏟️ Smol AI WorldCup: A 4B Model Just Beat 8B — Here's the Data
We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better.
Community Article: https://huggingface.co/blog/FINAL-Bench/smol-worldcup
Live Leaderboard: ginigen-ai/smol-worldcup
Dataset: ginigen-ai/smol-worldcup
What we found:
→ Gemma-3n-E4B (4B, 2GB RAM) outscores Qwen3-8B (8B, 5.5GB). Doubling parameters gained only 0.4 points. RAM cost: 2.75x more.
→ GPT-OSS-20B fits in 1.5GB yet matches Champions-league dense models requiring 8.5GB. MoE architecture is the edge AI game-changer.
→ Thinking models hurt structured output. DeepSeek-R1-7B scores 8.7 points below same-size Qwen3-8B and runs 2.7x slower.
→ A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes.
→ Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B.
What makes this benchmark different?
Most benchmarks ask "how smart?" — we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low.
Top 5 by WCS:
1. GPT-OSS-20B — WCS 82.6 — 1.5GB — Raspberry Pi tier
2. Gemma-3n-E4B — WCS 81.8 — 2.0GB — Smartphone tier
3. Llama-4-Scout — WCS 79.3 — 240 tok/s — Fastest model
4. Qwen3-4B — WCS 76.6 — 2.8GB — Smartphone tier
5. Qwen3-1.7B — WCS 76.1 — 1.2GB — IoT tier
Built in collaboration with the FINAL Bench research team. Interoperable with ALL Bench Leaderboard for full small-to-large model comparison.
Dataset is open under Apache 2.0 (125 questions, 7 languages). We welcome new model submissions.
We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better.
Community Article: https://huggingface.co/blog/FINAL-Bench/smol-worldcup
Live Leaderboard: ginigen-ai/smol-worldcup
Dataset: ginigen-ai/smol-worldcup
What we found:
→ Gemma-3n-E4B (4B, 2GB RAM) outscores Qwen3-8B (8B, 5.5GB). Doubling parameters gained only 0.4 points. RAM cost: 2.75x more.
→ GPT-OSS-20B fits in 1.5GB yet matches Champions-league dense models requiring 8.5GB. MoE architecture is the edge AI game-changer.
→ Thinking models hurt structured output. DeepSeek-R1-7B scores 8.7 points below same-size Qwen3-8B and runs 2.7x slower.
→ A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes.
→ Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B.
What makes this benchmark different?
Most benchmarks ask "how smart?" — we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low.
Top 5 by WCS:
1. GPT-OSS-20B — WCS 82.6 — 1.5GB — Raspberry Pi tier
2. Gemma-3n-E4B — WCS 81.8 — 2.0GB — Smartphone tier
3. Llama-4-Scout — WCS 79.3 — 240 tok/s — Fastest model
4. Qwen3-4B — WCS 76.6 — 2.8GB — Smartphone tier
5. Qwen3-1.7B — WCS 76.1 — 1.2GB — IoT tier
Built in collaboration with the FINAL Bench research team. Interoperable with ALL Bench Leaderboard for full small-to-large model comparison.
Dataset is open under Apache 2.0 (125 questions, 7 languages). We welcome new model submissions.
reacted to marksverdhei's post with 🤗 3 months ago
Post
2685
Dear Hugging Face team, can we please have a way to archive hf repositories / spaces? I have a bunch of spaces that used to work but don't any more due to the hf space implementations changing and i think it would be good if I could archive those like in GitHub.
React to this post if you want to see this feature! 💡
React to this post if you want to see this feature! 💡
reacted to codelion's post with 🔥 4 months ago
Post
6156
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!
Key findings from our research on optimal architectures for small language models:
→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning
We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.
Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m
Key findings from our research on optimal architectures for small language models:
→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning
We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.
Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m
reacted to tomaarsen's post with ❤️ 6 months ago
Post
4583
🤗 Sentence Transformers is joining Hugging Face! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face! Details:
Today, the Ubiquitous Knowledge Processing (UKP) Lab is transferring the project to Hugging Face. Sentence Transformers will remain a community-driven, open-source project, with the same open-source license (Apache 2.0) as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged. The project will continue to prioritize transparency, collaboration, and broad accessibility.
Read our full announcement for more details and quotes from UKP and Hugging Face leadership: https://huggingface.co/blog/sentence-transformers-joins-hf
We see an increasing wish from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.
I would like to thank the UKP Lab, and especially Nils Reimers and Iryna Gurevych, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to take the project to new heights. That choice ended up being very valuable for the embedding & Information Retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.
I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!
Today, the Ubiquitous Knowledge Processing (UKP) Lab is transferring the project to Hugging Face. Sentence Transformers will remain a community-driven, open-source project, with the same open-source license (Apache 2.0) as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged. The project will continue to prioritize transparency, collaboration, and broad accessibility.
Read our full announcement for more details and quotes from UKP and Hugging Face leadership: https://huggingface.co/blog/sentence-transformers-joins-hf
We see an increasing wish from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.
I would like to thank the UKP Lab, and especially Nils Reimers and Iryna Gurevych, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to take the project to new heights. That choice ended up being very valuable for the embedding & Information Retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.
I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!
reacted to sergiopaniego's post with 🔥 6 months ago
Post
3014
A few days ago, Thinking Machines Lab released “LoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.
Naturally, we decided to reproduce the results with TRL and release a guide!
https://huggingface.co/docs/trl/main/en/lora_without_regret
Naturally, we decided to reproduce the results with TRL and release a guide!
https://huggingface.co/docs/trl/main/en/lora_without_regret
reacted to yjernite's post with 🤗 9 months ago
Post
4289
𝗙𝗶𝗿𝘀𝘁 𝗚𝗣𝗔𝗜 𝗠𝗼𝗱𝗲𝗹 𝘄𝗶𝘁𝗵 𝗘𝗨 𝗗𝗮𝘁𝗮 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲? 🇪🇺
With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! 📊📚)
The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd 👀
In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month 🤗 ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too 💡)
Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations 🙌 Definitely a step forward for transparency 🔍
To learn more have a look at:
- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency
With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! 📊📚)
The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd 👀
In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month 🤗 ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too 💡)
Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations 🙌 Definitely a step forward for transparency 🔍
To learn more have a look at:
- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency
reacted to schuler's post with 🔥 about 1 year ago
Post
7273
📢 New Research Alert: Making Language Models Smaller & Smarter!
Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.
The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.
🔑 Key Findings:
• 77% parameter reduction.
• Maintained model capabilities.
• Improved generalization.
Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm
Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.
The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.
🔑 Key Findings:
• 77% parameter reduction.
• Maintained model capabilities.
• Improved generalization.
Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm
reacted to hexgrad's post with 🔥 about 1 year ago
Post
8882
hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages,
GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS
pip install kokoro, and still 82M parameters.GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS
reacted to mitkox's post with 🚀 about 1 year ago
Post
3926
llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.
Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec
Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms
Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms
Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec
llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.
Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.
Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec
Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms
Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms
Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec
llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.
Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
reacted to fdaudens's post with ❤️ about 1 year ago
Post
9971
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. 🚀
The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. 🚀
The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.
reacted to lewtun's post with 🚀 over 1 year ago
Post
7094
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the links:
- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute
- Code: https://github.com/huggingface/search-and-learn
Enjoy!
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the links:
- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute
- Code: https://github.com/huggingface/search-and-learn
Enjoy!
reacted to lhoestq's post with ❤️ over 1 year ago
Post
3281
Made a HF Dataset editor a la gg sheets here: lhoestq/dataset-spreadsheets
With Dataset Spreadsheets:
✏️ Edit datasets in the UI
🔗 Share link with collaborators
🐍 Use locally in DuckDB or Python
Available for the 100,000+ parquet datasets on HF :)
With Dataset Spreadsheets:
✏️ Edit datasets in the UI
🔗 Share link with collaborators
🐍 Use locally in DuckDB or Python
Available for the 100,000+ parquet datasets on HF :)
reacted to merve's post with ❤️ over 1 year ago
Post
5756
This week in open-source AI was insane 🤠 A small recap🕺🏻 merve/dec-6-releases-67545caebe9fc4776faac0a3
Multimodal 🖼️
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants 👏
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts
LLMs 💬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license 🔥
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! 🔥 nearly 8TB pretraining data in many languages!
Image/Video Generation 🖼️
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux
Audio 🔊
> Indic-Parler-TTS is a new text2speech model made by community
Multimodal 🖼️
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants 👏
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts
LLMs 💬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license 🔥
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! 🔥 nearly 8TB pretraining data in many languages!
Image/Video Generation 🖼️
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux
Audio 🔊
> Indic-Parler-TTS is a new text2speech model made by community
reacted to christopher's post with 🔥 over 1 year ago
Post
2455
The Lichess database of games, puzzles, and engine evaluations is now on the Hub:
Lichess
Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗
- https://huggingface.co/collections/Lichess/positions-datasets-66f50837db5cd3287d60d489
- https://huggingface.co/collections/Lichess/games-datasets-66f508df78f4b43e1bb2d353
Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗
- https://huggingface.co/collections/Lichess/positions-datasets-66f50837db5cd3287d60d489
- https://huggingface.co/collections/Lichess/games-datasets-66f508df78f4b43e1bb2d353