2 161

Shaobai Jiang

shaobaij

AI & ML interests

None yet

Recent Activity

upvoted a paper about 5 hours ago

RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning

upvoted a paper about 5 hours ago

FlexOlmo: Open Language Models for Flexible Data Use

upvoted a paper about 16 hours ago

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

View all activity

Organizations

None yet

upvoted 2 papers about 5 hours ago

RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning

Paper • 2507.07451 • Published about 1 month ago • 4

FlexOlmo: Open Language Models for Flexible Data Use

Paper • 2507.07024 • Published about 1 month ago • 6

upvoted a paper about 16 hours ago

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

Paper • 2508.00222 • Published 8 days ago • 5

upvoted a paper 1 day ago

Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published 18 days ago • 57

upvoted 4 papers 3 days ago

upvoted an article 3 days ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

and 11 others •

4 days ago

• 414

upvoted 2 papers 4 days ago

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Paper • 2507.23751 • Published 9 days ago • 1

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Paper • 2507.21509 • Published 11 days ago • 25

upvoted 6 papers 6 days ago

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models

Paper • 2507.07484 • Published about 1 month ago • 17

REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Paper • 2507.10541 • Published 26 days ago • 29

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published 12 days ago • 75

Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published 12 days ago • 31

CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Paper • 2507.09104 • Published 28 days ago • 17

RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published 24 days ago • 36

upvoted a paper 10 days ago

Towards Conversational Diagnostic AI

Paper • 2401.05654 • Published Jan 11, 2024 • 21

upvoted 2 papers 11 days ago

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7 • 58

Checklists Are Better Than Reward Models For Aligning Language Models

Paper • 2507.18624 • Published 16 days ago • 2

Shaobai Jiang

AI & ML interests

Recent Activity

Organizations

shaobaij's activity

Welcome GPT OSS, the new open-source model family from OpenAI!