ron-wolf
's Collections
Reading list
updated
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
•
2412.11768
•
Published
•
44
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
•
2412.14161
•
Published
•
52
HiRED: Attention-Guided Token Dropping for Efficient Inference of
High-Resolution Vision-Language Models in Resource-Constrained Environments
Paper
•
2408.10945
•
Published
•
11
PDFTriage: Question Answering over Long, Structured Documents
Paper
•
2309.08872
•
Published
•
53
Compressed Chain of Thought: Efficient Reasoning Through Dense
Representations
Paper
•
2412.13171
•
Published
•
36
The Matrix Calculus You Need For Deep Learning
Paper
•
1802.01528
•
Published
•
2
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Paper
•
2202.05780
•
Published
Recurrent Memory Transformer
Paper
•
2207.06881
•
Published
•
1
How many words does ChatGPT know? The answer is ChatWords
Paper
•
2309.16777
•
Published
•
1
Weaver: Foundation Models for Creative Writing
Paper
•
2401.17268
•
Published
•
46
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Paper
•
2308.09687
•
Published
•
7
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling
with Backtracking
Paper
•
2306.05426
•
Published
Think before you speak: Training Language Models With Pause Tokens
Paper
•
2310.02226
•
Published
•
3
What do tokens know about their characters and how do they know it?
Paper
•
2206.02608
•
Published
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
111
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
•
2503.05179
•
Published
•
47
Expressing stigma and inappropriate responses prevents LLMs from safely
replacing mental health providers
Paper
•
2504.18412
•
Published
•
1
Chain of Draft: Thinking Faster by Writing Less
Paper
•
2502.18600
•
Published
•
50
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large
Language Models
Paper
•
2506.19697
•
Published
•
44
Jasper and Stella: distillation of SOTA embedding models
Paper
•
2412.19048
•
Published
•
1
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
•
2301.13688
•
Published
•
9
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Paper
•
2411.19146
•
Published
•
18
Chain-of-Thought Reasoning Without Prompting
Paper
•
2402.10200
•
Published
•
110
Robust and Fine-Grained Detection of AI Generated Texts
Paper
•
2504.11952
•
Published
•
12
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
•
2507.00432
•
Published
•
73
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and
Mitigation
Paper
•
2507.05578
•
Published
•
5
Measuring the Impact of Early-2025 AI on Experienced Open-Source
Developer Productivity
Paper
•
2507.09089
•
Published
Stochastic LLMs do not Understand Language: Towards Symbolic,
Explainable and Ontologically Based LLMs
Paper
•
2309.05918
•
Published
The Debate Over Understanding in AI's Large Language Models
Paper
•
2210.13966
•
Published
Emergent World Representations: Exploring a Sequence Model Trained on a
Synthetic Task
Paper
•
2210.13382
•
Published
Evidence of Meaning in Language Models Trained on Programs
Paper
•
2305.11169
•
Published