H's picture

1 9

H

SunSwallow

AI & ML interests

None yet

Recent Activity

upvoted a paper 25 days ago

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

upvoted a paper about 1 month ago

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

upvoted a paper 2 months ago

Agent Learning via Early Experience

View all activity

Organizations

None yet

upvoted a paper 25 days ago

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published 29 days ago • 79

upvoted a paper about 1 month ago

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

Paper • 2511.16668 • Published Nov 20 • 53

upvoted a paper 2 months ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 269

upvoted 3 papers 3 months ago

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26 • 29

Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published Oct 9 • 44

From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature

Paper • 2509.16591 • Published Sep 20 • 2

upvoted a paper 4 months ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 227

upvoted a paper 5 months ago

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 141

upvoted a collection 5 months ago

OpenMathReasoning

Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset" • 7 items • Updated 3 days ago • 46

commented a paper 5 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 158 •