Wenkai Yang

Keven16

8 24 1

https://keven980716.github.io/

keven980716

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

authored a paper about 1 month ago

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

upvoted a paper about 1 month ago

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

View all activity

Organizations

None yet

upvoted 2 papers about 1 month ago

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

Paper • 2606.02684 • Published Jun 1 • 17

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

Paper • 2606.04703 • Published Jun 3 • 26

upvoted a paper 3 months ago

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 114

upvoted a paper 4 months ago

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

Paper • 2603.14465 • Published Mar 15 • 23

upvoted 3 papers 5 months ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published Feb 12 • 68

Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning

Paper • 2602.09439 • Published Feb 10 • 14

AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research

Paper • 2602.06540 • Published Feb 6 • 22

upvoted a paper 6 months ago

DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution

Paper • 2601.13761 • Published Jan 20 • 16

upvoted 3 papers 7 months ago

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2512.20848 • Published Dec 23, 2025 • 44

NVIDIA Nemotron 3: Efficient and Open Intelligence

Paper • 2512.20856 • Published Dec 24, 2025 • 44

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published Dec 18, 2025 • 97

upvoted 2 papers 8 months ago

Mixture of Horizons in Action Chunking

Paper • 2511.19433 • Published Nov 24, 2025 • 18

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

Paper • 2510.27623 • Published Oct 31, 2025 • 13

upvoted 2 papers 9 months ago

Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance

Paper • 2502.12459 • Published Feb 18, 2025 • 3

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

Paper • 2510.14943 • Published Oct 16, 2025 • 40

upvoted a collection 9 months ago

AEPO

Collection

The official datasets and model checkpoints of AEPO • 5 items • Updated Dec 20, 2025 • 4

upvoted an article 11 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

NormalUhr

•

Feb 11, 2025

• 129

upvoted a paper 12 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161

upvoted 2 papers about 1 year ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 278

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265

Wenkai Yang

AI & ML interests

Recent Activity

Organizations

Keven16's activity

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment