Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
tyzhu 's Collections
multimodal
long-context
knowledge
pretraining
IR
reasoning
multilingual
daily-papers

pretraining

updated Mar 6
Upvote
-

  • Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

    Paper • 2502.15499 • Published Feb 21 • 13

  • MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

    Paper • 2502.17422 • Published Feb 24 • 7

  • The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

    Paper • 2502.17535 • Published Feb 24 • 8

  • Scaling LLM Pre-training with Vocabulary Curriculum

    Paper • 2502.17910 • Published Feb 25 • 1

  • LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

    Paper • 2502.15007 • Published Feb 20 • 175

  • Predictive Data Selection: The Data That Predicts Is the Data That Teaches

    Paper • 2503.00808 • Published Mar 2 • 57
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs