ron-wolf
's Collections
Reading list
updated
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
•
2412.11768
•
Published
•
41
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
•
2412.14161
•
Published
•
51
HiRED: Attention-Guided Token Dropping for Efficient Inference of
High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper
•
2408.10945
•
Published
•
11
PDFTriage: Question Answering over Long, Structured Documents
Paper
•
2309.08872
•
Published
•
54
Compressed Chain of Thought: Efficient Reasoning Through Dense
Representations
Paper
•
2412.13171
•
Published
•
31
The Matrix Calculus You Need For Deep Learning
Paper
•
1802.01528
•
Published
•
2
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Paper
•
2202.05780
•
Published
Recurrent Memory Transformer
Paper
•
2207.06881
•
Published
•
1
How many words does ChatGPT know? The answer is ChatWords
Paper
•
2309.16777
•
Published
•
1
Weaver: Foundation Models for Creative Writing
Paper
•
2401.17268
•
Published
•
44
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Paper
•
2308.09687
•
Published
•
7
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling
with Backtracking
Paper
•
2306.05426
•
Published
Think before you speak: Training Language Models With Pause Tokens
Paper
•
2310.02226
•
Published
•
2
What do tokens know about their characters and how do they know it?
Paper
•
2206.02608
•
Published
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
107