Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Julius-L 's Collections
inference acceleration
multimodal dataset
Generation
Long Context
Finetuning
Memory Efficient Training
Pretraining
Model Architecture
Model Merging
Sparsification
Quantization
LLM Technical Reports
Unseen Papers

Model Architecture

updated Nov 4, 2024
Upvote
1

  • Differential Transformer

    Paper • 2410.05258 • Published Oct 7, 2024 • 179

  • Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

    Paper • 2410.20672 • Published Oct 28, 2024 • 6

  • TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

    Paper • 2410.23168 • Published Oct 30, 2024 • 24
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs