Collections
Discover the best community collections!
Collections including paper arxiv:2506.21103
-
Learning to Skip the Middle Layers of Transformers
Paper • 2506.21103 • Published • 10 -
tim-lawson/skip-middle-fineweb-baseline-2-layers
Text Generation • 0.1B • Updated • 10 -
tim-lawson/skip-middle-fineweb-baseline-4-layers
Text Generation • 0.1B • Updated • 103 -
tim-lawson/skip-middle-fineweb-baseline-6-layers
Text Generation • 0.1B • Updated • 9
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 38 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 175 -
Transformers without Normalization
Paper • 2503.10622 • Published • 166 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
Selective Attention Improves Transformer
Paper • 2410.02703 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 179 -
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Paper • 2410.05076 • Published • 8 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 30
-
Learning to Skip the Middle Layers of Transformers
Paper • 2506.21103 • Published • 10 -
tim-lawson/skip-middle-fineweb-baseline-2-layers
Text Generation • 0.1B • Updated • 10 -
tim-lawson/skip-middle-fineweb-baseline-4-layers
Text Generation • 0.1B • Updated • 103 -
tim-lawson/skip-middle-fineweb-baseline-6-layers
Text Generation • 0.1B • Updated • 9
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 84
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 38 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 175 -
Transformers without Normalization
Paper • 2503.10622 • Published • 166 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
-
Selective Attention Improves Transformer
Paper • 2410.02703 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 179 -
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Paper • 2410.05076 • Published • 8 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 30