Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.02387

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

RL+reason model

about 2 hours ago

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 121
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

RM-R1: Reward Modeling as Reasoning

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published 11 days ago • 66
gaotang/RM-R1-Distill-SFT

Viewer • Updated 8 days ago • 8.75k • 129 • 1
gaotang/RM-R1-after-Distill-RLVR

Viewer • Updated 8 days ago • 64.2k • 110 • 1

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published 8 days ago • 12
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper • 2505.04588 • Published 8 days ago • 58
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published 15 days ago • 47
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published 18 days ago • 35

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published 17 days ago • 87
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published 11 days ago • 66

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published 17 days ago • 35
The Leaderboard Illusion

Paper • 2504.20879 • Published 17 days ago • 68
ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published 17 days ago • 52
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published 11 days ago • 66

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published 23 days ago • 107
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 25 days ago • 83
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published 11 days ago • 66

Latent Reasoning

Deliberation in Latent Space via Differentiable Cache Augmentation

Paper • 2412.17747 • Published Dec 23, 2024 • 33
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 85
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published 11 days ago • 66

JudgeLRM: Large Reasoning Models as a Judge

Paper • 2504.00050 • Published Mar 31 • 60
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published 11 days ago • 66
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published 18 days ago • 35
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published 10 days ago • 87

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 103

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs