hanhui's picture

14

hanhui

clearhanhui

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Spatia: Video Generation with Updatable Spatial Memory

upvoted a paper about 2 months ago

Continuous Autoregressive Language Models

upvoted a paper 3 months ago

Single-stream Policy Optimization

View all activity

Organizations

None yet

upvoted a paper 1 day ago

Spatia: Video Generation with Updatable Spatial Memory

Paper • 2512.15716 • Published 10 days ago • 20

upvoted a paper about 2 months ago

Continuous Autoregressive Language Models

Paper • 2510.27688 • Published Oct 31 • 70

upvoted a paper 3 months ago

Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16 • 34

upvoted 5 papers 4 months ago

Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents

Paper • 2509.09265 • Published Sep 11 • 47

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101

Set Block Decoding is a Language Model Inference Accelerator

Paper • 2509.04185 • Published Sep 4 • 52

rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116

PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs

Paper • 2508.17188 • Published Aug 24 • 17

upvoted 5 papers 5 months ago

VertexRegen: Mesh Generation with Continuous Level of Detail

Paper • 2508.09062 • Published Aug 12 • 38

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 43

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 180

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Paper • 2507.14683 • Published Jul 19 • 134

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14 • 89

upvoted a paper 6 months ago

Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

Paper • 2506.08343 • Published Jun 10 • 54