Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.18449

about 11 hours ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 64
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7, 2024 • 46
Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11, 2024 • 30

about 6 hours ago

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published 1 day ago • 28

about 11 hours ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 106
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 80
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 91

about 14 hours ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 257
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9 • 53
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published Jan 15 • 10
FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 23

Large Language Model (LLM) and NLP related papers.

about 2 hours ago

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19, 2024 • 6
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20, 2024 • 21
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20, 2024 • 13
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

Code Generation

about 7 hours ago

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Paper • 2402.01391 • Published Feb 2, 2024 • 42
Code Representation Learning At Scale

Paper • 2402.01935 • Published Feb 2, 2024 • 13
Long Code Arena: a Set of Benchmarks for Long-Context Code Models

Paper • 2406.11612 • Published Jun 17, 2024 • 25
Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1, 2024 • 59

about 11 hours ago

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 147
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 30
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 23
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs