Daily Papers https://huggingface.co/papers Listen to an AI-generated conversation about the most upvoted research paper on Hugging Face each day. Still a beta — it's an experiment! Discussions are AI-generated — verify facts before citing. en-us Hugging Face Each day, this podcast dives into the top trending ML paper on Hugging Face. Still a beta — it's an experiment! Discussions are AI-generated — verify facts before citing. false HF florent.daudens@hf.co Fri, 06 Jun 2025 14:48:52 +0000 One-Step Video Restoration Breakthrough with SeedVR2 SeedVR2 represents a groundbreaking approach to video restoration, achieving high-quality visual fidelity and significantly reduced computational costs. This cutting-edge method leverages an adaptive window attention mechanism and adversarial post-training to restore degraded videos in a single step, surpassing the performance of traditional multi-step diffusion-based models. <a href="https://huggingface.co/papers/2506.05301">[Read the paper on Hugging Face]</a> Fri, 06 Jun 2025 14:48:52 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-06-06.wav false "Breaking AI Records: MiMo-VL-7B Revolutionizes Vision-Language Understanding" We explore the groundbreaking MiMo-VL-7B model, a vision-language system that surpasses state-of-the-art performance in both general visual understanding and multimodal reasoning, and discover the secret to its success through its four-stage pre-training phase and Mixed On-policy Reinforcement Learning framework. <a href="https://huggingface.co/papers/2506.03569">[Read the paper on Hugging Face]</a> Thu, 05 Jun 2025 13:39:40 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-06-05.wav false High-entropy minority tokens drive effective RLVR for LLM models. We're about to dive into the uncharted territory of Reinforcement Learning with Verifiable Rewards (RLVR) and explore how a select group of tokens with high entropy can dramatically improve the reasoning capabilities of Large Language Models. By examining token entropy patterns in Chain-of-Thought (CoT) reasoning, we'll uncover the crucial role these high-entropy minority tokens play in steering model behavior and discover a novel approach to optimizing RLVR training. <a href="https://huggingface.co/papers/2506.01939">[Read the paper on Hugging Face]</a> Tue, 03 Jun 2025 14:27:21 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-06-03.wav false Does RL Truly Unlock New Reasoning Capabilities in Large Language Models? We explore the limitations of past research and challenge prevailing assumptions about reinforcement learning (RL) in large language models, demonstrating that prolonged RL training can genuinely expand a model's reasoning capabilities and unveil novel solution pathways, not just optimize existing knowledge. <a href="https://huggingface.co/papers/2505.24864">[Read the paper on Hugging Face]</a> Mon, 02 Jun 2025 14:26:51 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-06-02.wav false RL's Hidden Enemy: Uncovering the Collapse of Policy Entropy When training large language models with reinforcement learning, researchers have encountered a major obstacle: the sudden collapse of policy entropy. This phenomenon, seen consistently across vast RL runs without entropy intervention, can render policy models too confident, leading to an exploration ceiling that hinders further improvement. <a href="https://huggingface.co/papers/2505.22617">[Read the paper on Hugging Face]</a> Thu, 29 May 2025 21:01:43 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-29.wav false Unraveling the Limitations of Logical Reasoning in Multimodal Large Language Models MME-Reasoning sheds light on the critical shortcomings and performance imbalances in current multimodal large language models, revealing a pressing need for a comprehensive logical reasoning benchmark that thoroughly evaluates their cognitive abilities. <a href="https://huggingface.co/papers/2505.21327">[Read the paper on Hugging Face]</a> Wed, 28 May 2025 13:37:37 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-28.wav false "Breaking Language Barriers: Introducing Mutarjim, the Revolutionary Arabic-English Translation Model" Discover the revolutionary new language model that's taking the world of machine translation by storm, with significantly improved accuracy and efficiency while simultaneously reducing computational costs and training requirements. In this episode, we'll explore the groundbreaking technology behind Mutarjim and introduce Tarjama-25, a new benchmark dataset designed to push the limits of Arabic-English translation evaluation. <a href="https://huggingface.co/papers/2505.17894">[Read the paper on Hugging Face]</a> Tue, 27 May 2025 17:15:54 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-27.wav false Harnessing Semantic Insight with TabSTAR: Revolutionizing Tabular Learning TabSTAR marks a significant breakthrough in tabular data analysis, combining deep learning with language model capabilities to achieve state-of-the-art performance on classification tasks involving text features, and offers a pathway for further performance improvements. <a href="https://huggingface.co/papers/2505.18125">[Read the paper on Hugging Face]</a> Mon, 26 May 2025 17:40:37 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-26.wav false Innovate Smarter: Revolutionizing Scientific Research with NovelSeek NovelSeek is a groundbreaking artificial intelligence framework that's changing the game in scientific research. Imagine having an entire team of experts at your fingertips, capable of generating novel ideas, designing experiments, and analyzing results in a seamless loop – all within a fraction of the time it takes humans to achieve similar results. <a href="https://huggingface.co/papers/2505.16938">[Read the paper on Hugging Face]</a> Fri, 23 May 2025 14:14:53 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-23.wav false WEB-SHEPHERD: Revolutionizing Web Navigation with AI Precision We explore the development of WEB-SHEPHERD, a process reward model that assesses web navigation trajectories at a step-level, and its potential to revolutionize the field of artificial intelligence and web navigation. <a href="https://huggingface.co/papers/2505.15277">[Read the paper on Hugging Face]</a> Thu, 22 May 2025 15:07:32 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-22.wav false BAGEL Makes Waves in Multimodal Pretraining BAGEL, a groundbreaking new multimodal foundation model, is revolutionizing the way we approach unified multimodal understanding and generation. Trained on trillions of tokens curated from diverse multimodal data, BAGEL exhibits emerging properties in complex multimodal reasoning, effortlessly navigating tasks like free-form image manipulation and future frame prediction while outperforming top-tier open-source VLMs on standard benchmarks. [Read the paper on Hugging Face](https://huggingface.co/papers/2505.14683) Wed, 21 May 2025 14:28:49 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-21.wav false Solving the Scaling Paradox with Chain-of-Language-Model We'll explore the groundbreaking concept of Chain-of-Language-Model (CoLM) which offers a novel approach to scale up language models while preserving training efficiency and enabling elastic inference by integrating multi-scale training within a single forward propagation. Tue, 20 May 2025 18:46:38 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-20.wav false Beyond Emergent 'Aha!' Moments: Unlocking Reliable Reasoning in Large Models We explore a novel approach to boost the reliability and scalability of large reasoning models by explicitly aligning them with classical reasoning meta-abilities. Our three-stage pipeline improves performance by over 10% and enables domain-specific reinforcement learning from aligned checkpoints, demonstrating a scalable and dependable foundation for reasoning. Fri, 16 May 2025 14:24:17 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-16.wav false MiniMax-Speech Revolutionizes Text-to-Speech with Zero-Shot Voice Cloning MiniMax-Speech is a groundbreaking Text-to-Speech model that generates high-quality speech with near-indistinguishable human resemblance, achieving state-of-the-art results on multiple objective and subjective evaluation metrics. This innovative model introduces a learnable speaker encoder module, allowing for zero-shot voice cloning and exceptional speaker similarity in one-shot scenarios. Thu, 15 May 2025 15:59:41 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-15.wav false SEED-15VL – Smarter, Leaner, Sharper SEED-15VL mixes a compact vision encoder with a MOE language model to crush reasoning benchmarks. Trained with diverse data and verifiable rewards, it's a glimpse into efficient, real-world AI—with a few sci-fi vibes. Tue, 14 May 2025 20:00:00 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-14.wav false Step 1x3D – From Scrap Models to Masterpieces Today's episode dives into Step 1x3D, a new open-source method that cleans noisy 3D data, bridges 2D–3D generation, and rivals top proprietary tools. From mesh repair to texture-perfect diffusion, it's a major leap for 3D AI. Tue, 13 May 2025 10:00:00 +0000 https://huggingface.co/spaces/fdaudens/podcast-jobs/resolve/main/podcasts/podcast-2025-05-13.wav false