Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
pgarbacki 's Collections
RL
data
retrieval
tool use
image
multimodal
optimizers
video
finetuning
foundational models
routing
reasoning
computer use

foundational models

updated Apr 4
Upvote
1

  • DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Paper • 2405.04434 • Published May 7, 2024 • 21

    Note MLA


  • Titans: Learning to Memorize at Test Time

    Paper • 2501.00663 • Published Dec 31, 2024 • 23

  • Transformer^2: Self-adaptive LLMs

    Paper • 2501.06252 • Published Jan 9 • 55

  • Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

    Paper • 2502.11089 • Published Feb 16 • 159

  • Trust Region Policy Optimization

    Paper • 1502.05477 • Published Feb 19, 2015

  • DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Paper • 2402.03300 • Published Feb 5, 2024 • 120

  • Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

    Paper • 2503.09573 • Published Mar 12 • 72
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs