While a fix is being implemented (https://github.com/ggml-org/llama.cpp/pull/12957) I want to leave the models up for visibility and continued discussion, but want to prevent accidental downloads of known broken models (even though there are settings that could fix it at runtime for now)
With this goal, I've enabled access requests. I don't really want your data, so I'm sorry that I don't think there's a way around that? But that's what I'm gonna do for now, and I'll remove the gate when a fix is up and verified and I have a chance to re-convert and quantize!
After diving into the latest benchmark results, it’s clear that Meta’s new Llama 4 lineup (Maverick, Scout, and Behemoth) is no joke.
Here are a few standout highlights🔍:
Llama 4 Maverick hits the sweet spot between cost and performance - Outperforms GPT-4o in image tasks like ChartQA (90.0 vs 85.7) and DocVQA (94.4 vs 92.8) - Beats others in MathVista and MMLU Pro too and at a fraction of the cost ($0.19–$0.49 vs $4.38 🤯)
Llama 4 Scout is lean, cost-efficient, and surprisingly capable - Strong performance across image and language tasks (e.g. ChartQA: 88.8, DocVQA: 94.4) - More affordable than most competitors and still beats out larger models like Gemini 2.0 Flash-Lite
Llama 4 Behemoth is the heavy hitter. - Tops the charts in LiveCodeBench (49.4), MATH-500 (95.0), and MMLU Pro (82.2) - Even edges out Claude 3 Sonnet and Gemini 2 Pro in multiple areas
Meta didn’t just show up, they delivered across multimodal, coding, reasoning, and multilingual benchmarks.
And honestly? Seeing this level of performance, especially at lower inference costs, is a big deal for anyone building on LLMs.
Curious to see how these models do in real-world apps next.