-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 109
Collections
Discover the best community collections!
Collections including paper arxiv:2403.04642
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 18 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 18 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 4
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 57 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 15
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 12 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 51 -
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Paper • 2401.05605 • Published -
Aligning Large Language Models with Counterfactual DPO
Paper • 2401.09566 • Published • 2
-
Diffusion World Model
Paper • 2402.03570 • Published • 7 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 13
-
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 109 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 24 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46
-
Metadata Might Make Language Models Better
Paper • 2211.10086 • Published • 4 -
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs
Paper • 2304.14999 • Published • 2 -
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques
Paper • 2401.02122 • Published • 2 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 121
-
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 29 -
Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Paper • 2312.08901 • Published -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 28 -
Making Large Language Models Better Reasoners with Step-Aware Verifier
Paper • 2206.02336 • Published • 1
-
Unicron: Economizing Self-Healing LLM Training at Scale
Paper • 2401.00134 • Published • 9 -
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
Paper • 2401.00788 • Published • 21 -
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
Paper • 2401.04398 • Published • 20 -
The Impact of Reasoning Step Length on Large Language Models
Paper • 2401.04925 • Published • 15
-
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Paper • 2210.14986 • Published • 5 -
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper • 2311.10702 • Published • 18 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 75 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 32