LM Capabilities and Scaling - a kaizuberbuehler Collection

kaizuberbuehler 's Collections

Reasoning, Thinking, RL and Test-Time Scaling

Vision Language Models

Foundation Models

Synthetic Data and Self-Improvement

Agents

LM Prompt Engineering

LM Capabilities and Scaling

LM Architectures

Code Generation

EXL2 Quantized Models

LM Capabilities and Scaling

updated Apr 24

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 29
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 23
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2, 2024 • 38
Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1, 2024 • 19
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 35
Capabilities of Gemini Models in Medicine

Paper • 2404.18416 • Published Apr 29, 2024 • 25
Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published May 20, 2024 • 30
On the Planning Abilities of Large Language Models -- A Critical Investigation

Paper • 2305.15771 • Published May 25, 2023 • 1
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 28
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Paper • 2406.09411 • Published Jun 13, 2024 • 20
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29
GEB-1.3B: Open Lightweight Large Language Model

Paper • 2406.09900 • Published Jun 14, 2024 • 21
Mixture of A Million Experts

Paper • 2407.04153 • Published Jul 4, 2024 • 5
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Paper • 2404.05405 • Published Apr 8, 2024 • 10
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12, 2024 • 74
Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 92
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 43
Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published Sep 24, 2024 • 31
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data

Paper • 2406.14546 • Published Jun 20, 2024 • 2
Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 95
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 53
ProgCo: Program Helps Self-Correction of Large Language Models

Paper • 2501.01264 • Published Jan 2 • 27
Densing Law of LLMs

Paper • 2412.04315 • Published Dec 5, 2024 • 19
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

Paper • 2411.17691 • Published Nov 26, 2024 • 13
PokerBench: Training Large Language Models to become Professional Poker Players

Paper • 2501.08328 • Published Jan 14 • 19
Do generative video models learn physical principles from watching videos?

Paper • 2501.09038 • Published Jan 14 • 35
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Paper • 2501.12370 • Published Jan 21 • 11
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28 • 32
Large Language Models Think Too Fast To Explore Effectively

Paper • 2501.18009 • Published Jan 29 • 24
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published Feb 13 • 195
Scaling Embedding Layers in Language Models

Paper • 2502.01637 • Published Feb 3 • 24
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6 • 34
Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published Feb 11 • 29
Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Paper • 2502.06857 • Published Feb 7 • 25
Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12 • 49
NoLiMa: Long-Context Evaluation Beyond Literal Matching

Paper • 2502.05167 • Published Feb 7 • 15
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 73
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective

Paper • 2502.17262 • Published Feb 24 • 21
Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 28
Implicit Reasoning in Transformers is Reasoning through Shortcuts

Paper • 2503.07604 • Published Mar 10 • 22
Shifting Long-Context LLMs Research from Input to Output

Paper • 2503.04723 • Published Mar 6 • 22
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

Paper • 2503.04872 • Published Mar 6 • 15