Mitigating Catastrophic Forgetting in Mathematical Reasoning Finetuning through Mixed Training Paper • 2512.13706 • Published 21 days ago • 1
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms Paper • 2510.13913 • Published Oct 15 • 3
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math Paper • 2510.13744 • Published Oct 15 • 5
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents Paper • 2509.06283 • Published Sep 8 • 17
Sasha: Creative Goal-Oriented Reasoning in Smart Homes with Large Language Models Paper • 2305.09802 • Published May 16, 2023
Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices Paper • 2509.02523 • Published Sep 2 • 7
Noise Contrastive Alignment of Language Models with Explicit Rewards Paper • 2402.05369 • Published Feb 8, 2024 • 2
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models Paper • 2405.04233 • Published May 7, 2024 • 3
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Paper • 2506.08009 • Published Jun 9 • 30
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator Paper • 2503.01103 • Published Mar 3 • 5
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Paper • 2502.15894 • Published Feb 21 • 20
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates Paper • 2407.06249 • Published Jul 8, 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" Paper • 2410.03727 • Published Sep 30, 2024 • 2
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper • 2502.06608 • Published Feb 10 • 39
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 3