stereoplegic 's Collections Long context
updated
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper
• 2310.15494
• Published
• 2
A Long Way to Go: Investigating Length Correlations in RLHF
Paper
• 2310.03716
• Published
• 10
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published
• 80
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper
• 2308.10882
• Published
• 1
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language
Models
Paper
• 2308.16137
• Published
• 41
Scaling Transformer to 1M tokens and beyond with RMT
Paper
• 2304.11062
• Published
• 3
Investigating Answerability of LLMs for Long-Form Question Answering
Paper
• 2309.08210
• Published
• 15
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
• 2309.14509
• Published
• 20
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published
• 90
PoSE: Efficient Context Window Extension of LLMs via Positional
Skip-wise Training
Paper
• 2309.10400
• Published
• 26
CLEX: Continuous Length Extrapolation for Large Language Models
Paper
• 2310.16450
• Published
• 10
Code Llama: Open Foundation Models for Code
Paper
• 2308.12950
• Published
• 29
CAT-LM: Training Language Models on Aligned Code And Tests
Paper
• 2310.01602
• Published
• 1
LongBench: A Bilingual, Multitask Benchmark for Long Context
Understanding
Paper
• 2308.14508
• Published
• 2
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
• 2309.11568
• Published
• 11
Paper
• 2309.03450
• Published
• 8
Effective Long-Context Scaling of Foundation Models
Paper
• 2309.16039
• Published
• 31
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios
via Prompt Compression
Paper
• 2310.06839
• Published
• 4
Context Compression for Auto-regressive Transformers with Sentinel
Tokens
Paper
• 2310.08152
• Published
• 1
Learning to Compress Prompts with Gist Tokens
Paper
• 2304.08467
• Published
• 3
Long-range Language Modeling with Self-retrieval
Paper
• 2306.13421
• Published
• 17
Can Retriever-Augmented Language Models Reason? The Blame Game Between
the Retriever and the Language Model
Paper
• 2212.09146
• Published
• 3
Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks
Paper
• 2305.18395
• Published
• 1
LLM+P: Empowering Large Language Models with Optimal Planning
Proficiency
Paper
• 2304.11477
• Published
• 3
SayCanPay: Heuristic Planning with Large Language Models using Learnable
Domain Knowledge
Paper
• 2308.12682
• Published
• 2
Combiner: Full Attention Transformer with Sparse Computation Cost
Paper
• 2107.05768
• Published
• 1
Paper
• 2203.08913
• Published
• 2
Adapting Language Models to Compress Contexts
Paper
• 2305.14788
• Published
• 1
Lost in the Middle: How Language Models Use Long Contexts
Paper
• 2307.03172
• Published
• 44
L-Eval: Instituting Standardized Evaluation for Long Context Language
Models
Paper
• 2307.11088
• Published
• 5
A Unified View of Long-Sequence Models towards Modeling Million-Scale
Dependencies
Paper
• 2302.06218
• Published
• 1
Blockwise Parallel Transformer for Long Context Large Models
Paper
• 2305.19370
• Published
• 3
Blockwise Self-Attention for Long Document Understanding
Paper
• 1911.02972
• Published
• 1
LSG Attention: Extrapolation of pretrained Transformers to long
sequences
Paper
• 2210.15497
• Published
• 1
Efficient Long-Text Understanding with Short-Text Models
Paper
• 2208.00748
• Published
• 1
Cure the headache of Transformers via Collinear Constrained Attention
Paper
• 2309.08646
• Published
• 14
Bird-Eye Transformers for Text Generation Models
Paper
• 2210.03985
• Published
• 1
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture
Paper
• 2310.03052
• Published
• 3
Efficient Streaming Language Models with Attention Sinks
Paper
• 2309.17453
• Published
• 14
LightSeq: Sequence Level Parallelism for Distributed Training of Long
Context Transformers
Paper
• 2310.03294
• Published
• 2
Ultra-Long Sequence Distributed Transformer
Paper
• 2311.02382
• Published
• 6
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper
• 2310.10638
• Published
• 30
Retrieval meets Long Context Large Language Models
Paper
• 2310.03025
• Published
• 4
AWESOME: GPU Memory-constrained Long Document Summarization using Memory
Mechanism and Global Salient Content
Paper
• 2305.14806
• Published
• 1
mLongT5: A Multilingual and Efficient Text-To-Text Transformer for
Longer Sequences
Paper
• 2305.11129
• Published
• 2
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper
• 2112.07916
• Published
• 2
Unleashing Infinite-Length Input Capacity for Large-scale Language
Models with Self-Controlled Memory System
Paper
• 2304.13343
• Published
• 1
Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing
Important Tokens
Paper
• 2305.04241
• Published
• 1
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation
of the Reversal Curse
Paper
• 2311.07468
• Published
• 1
Never Lost in the Middle: Improving Large Language Models via Attention
Strengthening Question Answering
Paper
• 2311.09198
• Published
• 3
SpanDrop: Simple and Effective Counterfactual Learning for Long
Sequences
Paper
• 2208.02169
• Published
• 1
System 2 Attention (is something you might need too)
Paper
• 2311.11829
• Published
• 43
Attention Sorting Combats Recency Bias In Long Context Language Models
Paper
• 2310.01427
• Published
• 1
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
• 2303.09752
• Published
• 2
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
• 2312.12742
• Published
• 13
Axiomatic Preference Modeling for Longform Question Answering
Paper
• 2312.02206
• Published
• 10
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
Documents
Paper
• 2312.01279
• Published
• 6
Extending Context Window of Large Language Models via Semantic
Compression
Paper
• 2312.09571
• Published
• 16
Zebra: Extending Context Window with Layerwise Grouped Local-Global
Attention
Paper
• 2312.08618
• Published
• 13
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
• 2401.18058
• Published
• 24
Extending LLMs' Context Window with 100 Samples
Paper
• 2401.07004
• Published
• 16
The What, Why, and How of Context Length Extension Techniques in Large
Language Models -- A Detailed Survey
Paper
• 2401.07872
• Published
• 2
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
• 2401.06951
• Published
• 26
Exploring Transformer Extrapolation
Paper
• 2307.10156
• Published
• 1
Gated Linear Attention Transformers with Hardware-Efficient Training
Paper
• 2312.06635
• Published
• 9
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published
• 27
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
Training-Free Long-Context Scaling of Large Language Models
Paper
• 2402.17463
• Published
• 24
LOCOST: State-Space Models for Long Document Abstractive Summarization
Paper
• 2401.17919
• Published
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Paper
• 2402.18508
• Published
HMT: Hierarchical Memory Transformer for Long Context Language
Processing
Paper
• 2405.06067
• Published
• 2
LLoCO: Learning Long Contexts Offline
Paper
• 2404.07979
• Published
• 22
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
Paper
• 2402.10685
• Published
• 1
XL3M: A Training-free Framework for LLM Length Extension Based on
Segment-wise Inference
Paper
• 2405.17755
• Published
• 1
Base of RoPE Bounds Context Length
Paper
• 2405.14591
• Published
SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context
Large Language Models
Paper
• 2406.05678
• Published
• 1
LongSkywork: A Training Recipe for Efficiently Extending Context Length
in Large Language Models
Paper
• 2406.00605
• Published
• 2
Equipping Transformer with Random-Access Reading for Long-Context
Understanding
Paper
• 2405.13216
• Published
• 1
THEANINE: Revisiting Memory Management in Long-term Conversations with
Timeline-augmented Response Generation
Paper
• 2406.10996
• Published
• 35
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
• 2402.04617
• Published
• 6
Farewell to Length Extrapolation, a Training-Free Infinite Context with
Finite Attention Scope
Paper
• 2407.15176
• Published
• 3
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Paper
• 2407.15891
• Published
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
• 2408.14906
• Published
• 144
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive
Study and Hybrid Approach
Paper
• 2407.16833
• Published
ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Paper
• 2408.15496
• Published
• 12
General-purpose, long-context autoregressive modeling with Perceiver AR
Paper
• 2202.07765
• Published
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long
Sequences Training
Paper
• 2407.15892
• Published
ContextCite: Attributing Model Generation to Context
Paper
• 2409.00729
• Published
• 14
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs
Paper
• 2406.18173
• Published
MemLong: Memory-Augmented Retrieval for Long Text Modeling
Paper
• 2408.16967
• Published
• 3
Efficient LLM Training and Serving with Heterogeneous Context Sharding
among Attention Heads
Paper
• 2407.17678
• Published
E2LLM: Encoder Elongated Large Language Models for Long-Context
Understanding and Reasoning
Paper
• 2409.06679
• Published
• 4
InfiniGen: Efficient Generative Inference of Large Language Models with
Dynamic KV Cache Management
Paper
• 2406.19707
• Published
ACON: Optimizing Context Compression for Long-horizon LLM Agents
Paper
• 2510.00615
• Published
• 34
Global Context Compression with Interleaved Vision-Text Transformation
Paper
• 2601.10378
• Published
• 2