Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.21103

about 14 hours ago

Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10

Learning to Skip the Middle Layers of Transformers

Transformers with a novel gating mechanism that skips layers from the middle outward: https://arxiv.org/pdf/2506.21103

Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10
tim-lawson/skip-middle-fineweb-baseline-2-layers

Text Generation • 0.1B • Updated 2 days ago • 10
tim-lawson/skip-middle-fineweb-baseline-4-layers

Text Generation • 0.1B • Updated 2 days ago • 103
tim-lawson/skip-middle-fineweb-baseline-6-layers

Text Generation • 0.1B • Updated 2 days ago • 9

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 38
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 166
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Paper • 2309.08600 • Published Sep 15, 2023 • 15
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10

Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 10
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 30

about 14 hours ago

Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10

Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10

Learning to Skip the Middle Layers of Transformers

Transformers with a novel gating mechanism that skips layers from the middle outward: https://arxiv.org/pdf/2506.21103

Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10
tim-lawson/skip-middle-fineweb-baseline-2-layers

Text Generation • 0.1B • Updated 2 days ago • 10
tim-lawson/skip-middle-fineweb-baseline-4-layers

Text Generation • 0.1B • Updated 2 days ago • 103
tim-lawson/skip-middle-fineweb-baseline-6-layers

Text Generation • 0.1B • Updated 2 days ago • 9

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 10
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 38
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Paper • 2502.15007 • Published Feb 20 • 175
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 166
Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3 • 32

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 30

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Paper • 2309.08600 • Published Sep 15, 2023 • 15
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published 4 days ago • 10

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs