13 20 35

Kaiyan Zhang

iseesaw

iseesaw

AI & ML interests

None yet

Recent Activity

liked a model about 8 hours ago

deepseek-ai/DeepSeek-R1-0528

upvoted a paper about 17 hours ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

liked a dataset 1 day ago

open-r1/Mixture-of-Thoughts

View all activity

Organizations

iseesaw's activity

upvoted a paper about 17 hours ago

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published 1 day ago • 82

upvoted 2 papers about 1 month ago

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 111

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 61

upvoted a collection about 2 months ago

Gemma 3 Release

Collection

24 items • Updated Apr 18 • 374

upvoted 2 papers 2 months ago

Video-T1: Test-Time Scaling for Video Generation

Paper • 2503.18942 • Published Mar 24 • 88

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Paper • 2503.11224 • Published Mar 14 • 27

upvoted an article 3 months ago

Article

Open R1: Update #3

and 9 others •

Mar 11

• 291

upvoted a collection 3 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated about 1 month ago • 316

upvoted a paper 3 months ago

Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14 • 18

upvoted an article 3 months ago

Article

Our Transformers Code Agent beats the GAIA benchmark!

and 1 other •

Jul 1, 2024

• 88

upvoted 3 articles 4 months ago

Article

Open-source DeepResearch – Freeing our search agents

and 4 others •

Feb 4

• 1.25k

Article

What is test-time compute and how to scale it?

and 1 other •

Feb 6

• 89

Article

Open R1: Update #2

and 6 others •

Feb 10

• 213

upvoted 2 papers 4 months ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 153

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published Jan 30 • 22

upvoted an article 5 months ago

Article

Process Reinforcement through Implicit Rewards

and 1 other •

Jan 3

• 27

upvoted a collection 5 months ago

Reasoning Datasets

Collection

Reasoning datasets that are trending 🔥 • 10 items • Updated Jan 3 • 24

upvoted a paper 5 months ago

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 42

upvoted a paper 6 months ago

Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 35

upvoted a paper 11 months ago

Towards Building Specialized Generalist AI with System 1 and System 2 Fusion

Paper • 2407.08642 • Published Jul 11, 2024 • 11