5 4 6

Ed Addario PRO

eaddario

EAddario

AI & ML interests

None yet

Recent Activity

reacted to AdinaY's post with 👀 about 7 hours ago

Skywork-Reward-V2🔥 Reward models by Skywork AI. https://huggingface.co/collections/Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84 ✨ 0.6B - 8B ✨ Trained on 26M human-LLM preference pairs ✨ 0.6B > 27B in many tasks

reacted to YerbaPage's post with 👀 about 7 hours ago

How to achieve 100% Pass Rate on HumanEval ? 🔥 Meet MGDebugger if you are tired of LLMs failing on complex bugs 🤔 Our MGDebugger, just hit 100% accuracy on HumanEval using the DeepSeek-R1 model. 🚀 ✨ Demo: https://huggingface.co/spaces/learnmlf/MGDebugger 📝 Paper: https://huggingface.co/papers/2410.01215 💻 Code: https://github.com/YerbaPage/MGDebugger HumanEval may be retired, we're ready for the next challenge In more complex scenarios! You may also take look at this repo for a collection of awesome repo-level coding tasks! 🖥️ https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation

reacted to mlabonne's post with 👍 about 7 hours ago

https://huggingface.co/LiquidAI open-sources a new generation of edge LLMs! 🥳 Based on a new hybrid architecture, these 350M, 700M, and 1.2B models are both fast and performant, ideal for on-device deployment. I recommend fine-tuning them to power your next edge application. We already provide Colab notebooks to guide you. More to come soon! 📝 Blog post: https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models 🤗 Models: https://huggingface.co/collections/LiquidAI/lfm2-686d721927015b2ad73eaa38

View all activity

Organizations

Posts 12

Post

3723

Layer-wise and Pruned versions of cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition

* Tesor-wise: eaddario/Dolphin-Mistral-24B-Venice-Edition-GGUF

* Pruned: eaddario/Dolphin-Mistral-24B-Venice-Edition-pruned-GGUF

Summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.

Post

1695

Model pruning for the masses!

As of version [5740](https://github.com/ggml-org/llama.cpp/releases/tag/b5740), llama-quantize now supports layer pruning via the --prune-layers flag!

Findings so far are that removing one or two layers has a relatively moderate impact on quality. PPL and KLD suffer quite a lot, as expected considering that pruning changes the logits distribution, but the drop in inference quality, as reflected by tests' scores, is less pronounced.

For example, using the Q4_K_M variants as a benchmark, the average drop between eaddario/gemma-3-12b-it-pruned-GGUF and eaddario/gemma-3-12b-it-GGUF is < 3% (60.03 vs 61.65). Similar behaviour for eaddario/Qwen3-30B-A3B-pruned-GGUF and eaddario/Qwen3-30B-A3B-GGUF, albeit with a bit higher impact at ~5.5% (54.19 vs 57.36).

These results seem to confirm Xin Men's et al ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)

Another interesting side-effect, at least with Qwen3-30B-A3B, is that pruning 3 or more layers makes the model forget English and reply in Chinese! but with still reasonable answers.

View all Posts

models 16

datasets 1

eaddario/imatrix-calibration

Updated May 24 • 2.32k • 3