jxtngx
's Collections
LM papers
updated
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
49
LLaMA: Open and Efficient Foundation Language Models
Paper
•
2302.13971
•
Published
•
13
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper
•
2401.17464
•
Published
•
16
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
Experts
Paper
•
2407.21770
•
Published
•
22
LoRA: Low-Rank Adaptation of Large Language Models
Paper
•
2106.09685
•
Published
•
30
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
•
2205.14135
•
Published
•
11
FlashAttention-2: Faster Attention with Better Parallelism and Work
Partitioning
Paper
•
2307.08691
•
Published
•
8
8-bit Optimizers via Block-wise Quantization
Paper
•
2110.02861
•
Published
•
2
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
•
2104.09864
•
Published
•
11
Efficiently Modeling Long Sequences with Structured State Spaces
Paper
•
2111.00396
•
Published
•
1
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
Transformers
Paper
•
2210.17323
•
Published
•
8
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
•
2403.17887
•
Published
•
78
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
•
1907.11692
•
Published
•
7
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
16
Universal Language Model Fine-tuning for Text Classification
Paper
•
1801.06146
•
Published
•
6
Efficient and robust approximate nearest neighbor search using
Hierarchical Navigable Small World graphs
Paper
•
1603.09320
•
Published
•
1
Language Models are Few-Shot Learners
Paper
•
2005.14165
•
Published
•
11
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Framework
Paper
•
2308.08155
•
Published
•
3
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper
•
2306.05685
•
Published
•
31
The Perfect Blend: Redefining RLHF with Mixture of Judges
Paper
•
2409.20370
•
Published
•
4
Megatron-LM: Training Multi-Billion Parameter Language Models Using
Model Parallelism
Paper
•
1909.08053
•
Published
•
2
ReAct: Synergizing Reasoning and Acting in Language Models
Paper
•
2210.03629
•
Published
•
15
Agent-as-a-Judge: Evaluate Agents with Agents
Paper
•
2410.10934
•
Published
•
18