Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.21187

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 45
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published 27 days ago • 35
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 45

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

Paper • 2501.03916 • Published 19 days ago • 14
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published 18 days ago • 89
Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published 19 days ago • 81
Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published 17 days ago • 80

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 25 days ago • 98
Are Vision-Language Models Truly Understanding Multi-vision Sensor?

Paper • 2412.20750 • Published 27 days ago • 20
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published 27 days ago • 36
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 96

interest_need_read

感兴趣热门论文集合

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 78
Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published Dec 10, 2024 • 27
OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 31
Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 42

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published 27 days ago • 36
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Paper • 2412.21037 • Published 27 days ago • 23

Code&Math&Reasoning

Evaluating and Aligning CodeLLMs on Human Preference

Paper • 2412.05210 • Published Dec 6, 2024 • 47
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 45
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published 27 days ago • 36
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving

Paper • 2412.20735 • Published 27 days ago • 11

Meta-Learning a Dynamical Language Model

Paper • 1803.10631 • Published Mar 28, 2018
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation

Paper • 2003.11963 • Published Mar 26, 2020
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

Paper • 2212.04960 • Published Dec 9, 2022 • 1
Continuous Learning in a Hierarchical Multiscale Neural Network

Paper • 1805.05758 • Published May 15, 2018 • 1

about 23 hours ago

On Memorization of Large Language Models in Logical Reasoning

Paper • 2410.23123 • Published Oct 30, 2024 • 18
LLMs Do Not Think Step-by-step In Implicit Reasoning

Paper • 2411.15862 • Published Nov 24, 2024 • 8
Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 75
Deliberation in Latent Space via Differentiable Cache Augmentation

Paper • 2412.17747 • Published Dec 23, 2024 • 29

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Paper • 2410.13639 • Published Oct 17, 2024 • 17
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24, 2024 • 40
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Paper • 2412.03205 • Published Dec 4, 2024 • 16
Free Process Rewards without Process Labels

Paper • 2412.01981 • Published Dec 2, 2024 • 31

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 53
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Paper • 2306.01693 • Published Jun 2, 2023 • 3
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 147
Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11, 2024 • 27

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs