Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
pgarbacki 's Collections
RL
data
retrieval
tool use
image
multimodal
optimizers
video
finetuning
foundational models
routing
reasoning
computer use

reasoning

updated Feb 9
Upvote
-

  • Training Large Language Models to Reason in a Continuous Latent Space

    Paper • 2412.06769 • Published Dec 9, 2024 • 86

  • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Paper • 2408.03314 • Published Aug 6, 2024 • 63

  • ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights

    Paper • 2406.14596 • Published Jun 20, 2024 • 5

  • A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More

    Paper • 2407.16216 • Published Jul 23, 2024

  • Thinking LLMs: General Instruction Following with Thought Generation

    Paper • 2410.10630 • Published Oct 14, 2024 • 20

  • rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

    Paper • 2501.04519 • Published Jan 8 • 277

  • TextGrad: Automatic "Differentiation" via Text

    Paper • 2406.07496 • Published Jun 11, 2024 • 32

  • Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving

    Paper • 2002.03629 • Published Feb 10, 2020

  • LIMO: Less is More for Reasoning

    Paper • 2502.03387 • Published Feb 5 • 61
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs