RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.
Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.
Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content — no more guessing if the info is accurate.
The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.
It’s still a bit technical to set up right now, but this could get super simple soon — like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.
Not saying it solves everything, but it’s definitely a new way to distribute content — and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.
Curious what folks think — could MCPs be a real opportunity for journalism?
For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.
So here are 13 new methods + 3 comprehensive studies on test-time scaling:
3. Z1: Efficient Test-time Scaling with Code (2504.00810) Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window
Want AI that truly understands your country's culture? Public institutions are sitting on the next AI revolution - and here's the practical guide to unlock it.
I've had fascinating conversations recently about sovereign AI, with people trying to solve this recurring question: "How do we build AI that truly understands our culture?"
This guide by @evijit and @yjernite brings lots of insights about this question. It's not just about throwing data at models. It's about partnering cultural expertise with tech infrastructure in ways we're just starting to figure out.
An example? The National Library of Norway already has 150+ AI models on Hugging Face. They're not just digitizing books - they're building AI that thinks in Norwegian, understands Norwegian values, and serves Norwegian citizens.
This is sovereign AI in practice: technology that understands your culture, values, and languages.
Especially loved the practical examples on how to do this: - Real examples from museums, libraries, and government agencies - How to convert complex documents (PDFs, PowerPoints) into ML-ready formats - Code templates for processing public data - Technical recipes for sharing datasets on open platforms
The stakes? Citizens' ability to leverage their collective digital intelligence.
The technology is ready. The infrastructure exists. The guide shows exactly how to use it. What's needed is your cultural expertise to shape these tools.
Do chatbots lie about Céline Dion? We now have answers, not speculation.
Ai2 just released OLMoTrace and it's a game-changer for transparency. You can literally see where an AI's responses come from in its training data - in real time.
The demo shows results about Céline. So I tried it out myself! Watch what happens in the video.
For journalists, researchers studying hallucinations and anyone who needs to trust their AI, this is like getting X-ray vision into AI systems. When the model made claims, I could instantly verify them against original sources. When it hallucinated, I could see why.
You can finally 1) understand how LLMs actually work and 2) verify if what they're saying is true. No more blind trust.
This pushes the open data movement to the next level.
🎨 Designers, meet OmniSVG! This new model helps you create professional vector graphics from text/images, generate editable SVGs from icons to detailed characters, convert rasters to vectors, maintain style consistency with references, and integrate into your workflow.
I read the 456-page AI Index report so you don't have to (kidding). The wild part? While AI gets ridiculously more accessible, the power gap is actually widening:
1️⃣ The democratization of AI capabilities is accelerating rapidly: - The gap between open and closed models is basically closed: difference in benchmarks like MMLU and HumanEval shrunk to just 1.7% in 2024 - The cost to run GPT-3.5-level performance dropped 280x in 2 years - Model size is shrinking while maintaining performance - Phi-3-mini hitting 60%+ MMLU at fraction of parameters of early models like PaLM
2️⃣ But we're seeing concerning divides deepening: - Geographic: US private investment ($109B) dwarfs everyone else - 12x China's $9.3B - Research concentration: US and China dominate highly-cited papers (50 and 34 respectively in 2023), while next closest is only 7 - Gender: Major gaps in AI skill penetration rates - US shows 2.39 vs 1.71 male/female ratio
The tech is getting more accessible but the benefits aren't being distributed evenly. Worth thinking about as these tools become more central to the economy.
AI agents are transforming how we interact with technology, but how sustainable are they? 🌍
Design choices — like model size and structure — can massively impact energy use and cost. ⚡💰 The key takeaway: smaller, task-specific models can be far more efficient than large, general-purpose ones.
🔑 Open-source models offer greater transparency, allowing us to track energy consumption and make more informed decisions on deployment. 🌱 Open-source = more efficient, eco-friendly, and accountable AI.
AI inference refers to the process when AI models generate predictions, classifications, or decisions based on input data and pre-trained models. It encompasses a wide range of approaches with different computational methods and deployment.
Firstly, here are 5 inference types, based on how the model reasons:
1. Probabilistic inference -> https://arxiv.org/pdf/2502.05244 Uses probability theory to reason under uncertainty. The system maintains degrees of belief over hypotheses and updates them as evidence comes in.
3. Logical inference -> https://arxiv.org/abs/2009.03393 Uses formal logic to draw conclusions that are guaranteed true if the premises are. It supports theorem proving, logic programming, and tasks needing correctness, like software verification.
See that purple banner on the Llama 4 models? It's Xet storage, and this is actually huge for anyone building with AI models. Let's geek out a little bit 🤓
Current problem: AI models are massive files using Git LFS. But with models getting bigger and downloads exploding, we needed something better. Xet lets you version large files like code, with compression and deduplication, all Git-compatible. That means less bandwidth, faster sharing, and smoother collaboration.
Real numbers: ~25% deduplication on Llama 4 models, hitting ~40% for finetunes.
Scale matters here - the Hub served 2B model downloads in 30 days, Llama models alone at 60M. The upcoming Llama 4 Behemoth has 2T parameters! Xet's chunk-based system was built exactly for this.
This is the kind of engineering that makes the next wave of large models actually usable. Kudos to the team! 🧨
"Am I going to be replaced by AI?" - Crucial question, but maybe we're asking the wrong one.
📈 There's a statistic from my reads this week that stays with me: Tomer Cohen, LinkedIn's CPO, shares to Jeremy Kahn that 70% of skills used in most jobs will change by 2030. Not jobs disappearing, but transforming. And he calls out bad leadership: "If in one year's time, you are disappointed that your workforce is not 'AI native,' it is your fault."
🔄 Apparently, the Great Recalibration has begun. We're now heading into an era where AI is fundamentally redefining the nature of work itself, by forcing a complete reassessment of human value in the workplace, according to a piece in Fast Company. But it might be driven more by "the need for humans to change the way they work" than AI.
⚡ The Washington Post draws a crucial parallel: We're facing an "AI shock" similar to manufacturing's "China shock" - but hitting knowledge workers. Especially entry-level, white-collar work could get automated. The key difference? "Winning the AI tech competition with other countries won't be enough. It's equally vital to win the battle to re-skill workers."
Did we just drop personalized AI evaluation?! This tool auto-generates custom benchmarks on your docs to test which models are the best.
Most benchmarks test general capabilities, but what matters is how models handle your data and tasks. YourBench helps answer critical questions like: - Do you really need a hundreds-of-billions-parameter model sledgehammer to crack a nut? - Could a smaller, fine-tuned model work better? - How well do different models understand your domain?
Some cool features: 📚 Generates custom benchmarks from your own documents (PDFs, Word, HTML) 🎯 Tests models on real tasks, not just general capabilities 🔄 Supports multiple models for different pipeline stages 🧠 Generate both single-hop and multi-hop questions 🔍 Evaluate top models and deploy leaderboards instantly 💰 Full cost analysis to optimize for your budget 🛠️ Fully configurable via a single YAML file
26 SOTA models tested for question generation. Interesting finding: Qwen2.5 32B leads in question diversity, while smaller Qwen models and Gemini 2.0 Flash offer great value for cost.
You can also run it locally on any models you want.
Want to vibecode with DeepSeek? Just spent 10 minutes with this space and created a full world indicators dashboard - literally just by describing what I wanted!
Anyone can now prototype and deploy projects instantly.
How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.
Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:
Want to ramp up your AI skills and start breaking bigger stories? With the Journalists on Hugging Face community, we're launching our first learn-together course!
We'll build AI classifiers that process months of data in minutes. How?
- Work through an interactive version of an excellent course developed by Ben Welsh and Derek Willis - Share findings and get help in our dedicated community channel - Build working classifiers you can use in your reporting today
No coding background needed - if you can write a ChatGPT or Claude prompt, you can do this. Journalists are already using these techniques to break stories, from uncovering hidden real estate deals to tracking unusual campaign spending.
Join us—it might give you your next big story!
Thanks to Ben and Derek for letting me adapt their excellent course into this interactive version!
As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.
Here are 8 types of RoPE that can be implemented in different cases:
4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923) Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.
8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.
🎥 Just tested Stability AI's Stable Virtual Camera - it turns a single photo into dynamic video with AI-powered camera movements! From static meeting room to cinematic sweeps. 🚀