Cognition - a Ksgk-fy Collection

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 42

Note General model is not great at specializing tasks. Narrow-domain fine-tuned checkpoint becomes better at specific tasks, such local improvement can feedback onto the full training dataset, achieving self-augmentation based improvement. This is a interesting idea.

Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 119

Note Use small language model to search the graph and route to the doman expert.

Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 49

Note Automatic Flow Engineering done by 3B fine-tuned LLM, grounded on selective set of API-based functions. Planning model perform task decomposition, but do not do specific calls. Effectively doing flow (prompt) engineering here. Topology in plans are lacking and static plan-ahead approach is less robust (although good according to their curated 1k test dataset)

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 43

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published Sep 4, 2024 • 55

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29, 2024 • 96

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 50

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 83

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 75

WaveletGPT: Wavelets Meet Large Language Models

Paper • 2409.12924 • Published Sep 4, 2024 • 1

Note Treating intermediate embedding sequences as a bunch of signals and apply 1D convolution on temporal axis, similar to ConvMixer's manipulation in some sense, experimentation conducted on pre-training transformer. Interesting result is reported in the paper. Unfortunately no 'wave' is actually applied, no 'periodic' information is captured.

ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs

Paper • 2403.09724 • Published Mar 12, 2024 • 1

Learning Iterative Reasoning through Energy Diffusion

Paper • 2406.11179 • Published Jun 17, 2024 • 1

Note Newton's introduction of gravity illustrates how understanding derivatives—knowing how things move rather than just where they are—enhances reasoning about the world. Large language models (LLMs), while excelling at compressing data distributions, struggle with reasoning. Reasoning involves grasping the 'abstract structure' of data. Therefore, by modeling derivatives of data distributions, could we improve LLMs' reasoning capabilities?

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Paper • 2106.02795 • Published Jun 5, 2021 • 1

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 114

Can LLMs Reason in the Wild with Programs?

Paper • 2406.13764 • Published Jun 19, 2024 • 1

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published Sep 26, 2024 • 48

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 63

Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization

Paper • 2403.03419 • Published Mar 6, 2024 • 1

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 95

Note Tokenization unifies perception and generation, end-to-end training with discrete multi-modality signal enables both.

Can Models Learn Skill Composition from Examples?

Paper • 2409.19808 • Published Sep 29, 2024 • 10

Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published Oct 2, 2024 • 29

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

Paper • 2410.01044 • Published Oct 1, 2024 • 37

Intelligence at the Edge of Chaos

Paper • 2410.02536 • Published Oct 3, 2024 • 6

Note Intelligence is very likely the ability to model higher order derivatives given lower order observation.

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Paper • 2410.02155 • Published Oct 3, 2024 • 2

Note MLLM usually project a continuous Image embedding onto hidden space of LLM. Vector quantization (VQ) convert an image into discrete codes representing each of its patches, these tokens could be ported into LLM in a more similar fashion as text tokens -- new embedding vectors. Therefore a natural extension is just to re-use the BPE approach onto these image tokens. Which is precisely what happens in this work. However, I

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation

Paper • 2410.02725 • Published Oct 3, 2024 • 1

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24

Note "If two computer programs perform the same task, the shorter one is generally better." This principle, known as Occam's Razor, is a critical guideline for scientific discovery. Our best program today is the Transformer. Can we make it more efficient? Selective attention improves the Transformer by allowing each token to decide whether previous context is still relevant for future tokens.

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 27

EmbedLLM: Learning Compact Representations of Large Language Models

Paper • 2410.02223 • Published Oct 3, 2024 • 3

Model Comparisons: XNet Outperforms KAN

Paper • 2410.02033 • Published Oct 2, 2024 • 1

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Paper • 2410.01930 • Published Oct 2, 2024 • 1

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 151

ε-VAE: Denoising as Visual Decoding

Paper • 2410.04081 • Published Oct 5, 2024 • 7

Note I find it strange to view encoder which produces embedding vector as a type of tokenization --- then transformer effectively has two tokenization process... a discrete one and then a continuous one ?

Emergent properties with repeated examples

Paper • 2410.07041 • Published Oct 9, 2024 • 8

Note Compression requires redundancy, otherwise it's just memorization

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Paper • 2410.06981 • Published Oct 9, 2024 • 2

Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines

Paper • 2410.07896 • Published Oct 10, 2024 • 2

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

Paper • 2408.08252 • Published Aug 15, 2024 • 1

From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions

Paper • 2410.08197 • Published Oct 10, 2024 • 2

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Paper • 2410.06940 • Published Oct 9, 2024 • 8

LeanAgent: Lifelong Learning for Formal Theorem Proving

Paper • 2410.06209 • Published Oct 8, 2024 • 1

SimpleStrat: Diversifying Language Model Generation with Stratification

Paper • 2410.09038 • Published Oct 11, 2024 • 4

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

Paper • 2410.08821 • Published Oct 11, 2024 • 1

Discrete Flow Matching

Paper • 2407.15595 • Published Jul 22, 2024 • 13

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Paper • 2410.11081 • Published Oct 14, 2024 • 19

EVOLvE: Evaluating and Optimizing LLMs For Exploration

Paper • 2410.06238 • Published Oct 8, 2024 • 1

Neural Metamorphosis

Paper • 2410.11878 • Published Oct 10, 2024 • 8

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Paper • 2410.12112 • Published Oct 15, 2024 • 1

Steering Large Language Models between Code Execution and Textual Reasoning

Paper • 2410.03524 • Published Oct 4, 2024 • 3

A Scalable Communication Protocol for Networks of Large Language Models

Paper • 2410.11905 • Published Oct 14, 2024 • 1

Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL

Paper • 2410.12491 • Published Oct 16, 2024 • 4

Revealing the Barriers of Language Agents in Planning

Paper • 2410.12409 • Published Oct 16, 2024 • 28

Learning to Compress: Local Rank and Information Compression in Deep Neural Networks

Paper • 2410.07687 • Published Oct 10, 2024 • 1

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7, 2024 • 70

Instruction-Driven Game Engine: A Poker Case Study

Paper • 2410.13441 • Published Oct 17, 2024 • 1

Transformer Guided Coevolution: Improved Team Formation in Multiagent Adversarial Games

Paper • 2410.13769 • Published Oct 17, 2024 • 1

Learning Graph Quantized Tokenizers for Transformers

Paper • 2410.13798 • Published Oct 17, 2024 • 1

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

Paper • 2410.13643 • Published Oct 17, 2024

Learning to Route with Confidence Tokens

Paper • 2410.13284 • Published Oct 17, 2024 • 2

An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 4

Artificial Kuramoto Oscillatory Neurons

Paper • 2410.13821 • Published Oct 17, 2024 • 1

TopoLM: brain-like spatio-functional organization in a topographic language model

Paper • 2410.11516 • Published Oct 15, 2024 • 1

Autoregressive Image Generation without Vector Quantization

Paper • 2406.11838 • Published Jun 17, 2024 • 4

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25, 2024 • 80

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Paper • 2410.12189 • Published Oct 16, 2024 • 1

SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 30

Do LLMs "know" internally when they follow instructions?

Paper • 2410.14516 • Published Oct 18, 2024 • 1

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Paper • 2410.10846 • Published Oct 1, 2024 • 2

One-Step Diffusion Distillation through Score Implicit Matching

Paper • 2410.16794 • Published Oct 22, 2024 • 2

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Paper • 2405.18400 • Published May 28, 2024 • 1

Lightweight Neural App Control

Paper • 2410.17883 • Published Oct 23, 2024 • 10

Literature Meets Data: A Synergistic Approach to Hypothesis Generation

Paper • 2410.17309 • Published Oct 22, 2024 • 1

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Paper • 2410.18076 • Published Oct 23, 2024 • 4

Note Encodes interaction trajectories into "skill vectors" that act like abstract concepts: a skill decoder (low-level policy) translates them into specific actions based on the current state—similar to how our concepts become concrete actions in different situations. By relabeling experiences with these skills, they train a high-level policy to select optimal skills that maximize rewards. This hierarchical approach hints at the possibility for AI systems to formulate and think in their own-curated a

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 52

Non-myopic Generation of Language Model for Reasoning and Planning

Paper • 2410.17195 • Published Oct 22, 2024 • 1

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published Oct 22, 2024 • 30

Unbounded: A Generative Infinite Game of Character Life Simulation

Paper • 2410.18975 • Published Oct 24, 2024 • 38

ToolGen: Unified Tool Retrieval and Calling via Generation

Paper • 2410.03439 • Published Oct 4, 2024 • 1

Accelerating Exploration with Unlabeled Prior Data

Paper • 2311.05067 • Published Nov 9, 2023 • 1

Note Random network distillation as extra reward for exploration encouragement for RL.

Efficient Online Reinforcement Learning with Offline Data

Paper • 2302.02948 • Published Feb 6, 2023 • 2

Note Re-using previous experience to increase RL learning efficiency.

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Paper • 2410.17891 • Published Oct 23, 2024 • 16

Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published May 20, 2024 • 31

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Paper • 2410.13835 • Published Oct 17, 2024 • 1

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 48

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Paper • 2410.10812 • Published Oct 14, 2024 • 18

MCSD: An Efficient Language Model with Diverse Fusion

Paper • 2406.12230 • Published Jun 18, 2024 • 1

The Scene Language: Representing Scenes with Programs, Words, and Embeddings

Paper • 2410.16770 • Published Oct 22, 2024 • 1

Pyramidal Flow Matching for Efficient Video Generative Modeling

Paper • 2410.05954 • Published Oct 8, 2024 • 40

Energy-Based Diffusion Language Models for Text Generation

Paper • 2410.21357 • Published Oct 28, 2024 • 1

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Paper • 2410.01131 • Published Oct 1, 2024 • 10

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 51

Inference Optimal VLMs Need Only One Visual Token but Larger Models

Paper • 2411.03312 • Published Nov 5, 2024 • 7

DroidSpeak: Enhancing Cross-LLM Communication

Paper • 2411.02820 • Published Nov 5, 2024 • 1

Note Efficient cross-LLM communication through KV & E cache passing.

Wave Network: An Ultra-Small Language Model

Paper • 2411.02674 • Published Nov 4, 2024 • 3

Thinking Forward and Backward: Effective Backward Planning with Large Language Models

Paper • 2411.01790 • Published Nov 4, 2024 • 1

Adaptive Length Image Tokenization via Recurrent Allocation

Paper • 2411.02393 • Published Nov 4, 2024 • 13

Note Using fixed tokens to encode image, adding new tokens recursively until reaching satisfacotry compression level.

Improving Steering Vectors by Targeting Sparse Autoencoder Features

Paper • 2411.02193 • Published Nov 4, 2024 • 1

How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published Nov 4, 2024 • 36

Tool Learning with Foundation Models

Paper • 2304.08354 • Published Apr 17, 2023 • 3

Spontaneous Emergence of Agent Individuality through Social Interactions in LLM-Based Communities

Paper • 2411.03252 • Published Nov 5, 2024 • 1

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

Paper • 2405.20092 • Published May 30, 2024 • 1

The Road Less Scheduled

Paper • 2405.15682 • Published May 24, 2024 • 28

Squeezed Attention: Accelerating Long Context Length LLM Inference

Paper • 2411.09688 • Published Nov 14, 2024 • 1

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Paper • 2411.09702 • Published Nov 14, 2024 • 1

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Paper • 2411.13543 • Published Nov 20, 2024 • 18

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

Paper • 2411.15100 • Published Nov 22, 2024 • 6

DynaSaur: Large Language Agents Beyond Predefined Actions

Paper • 2411.01747 • Published Nov 4, 2024 • 35

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Paper • 2411.18616 • Published Nov 27, 2024 • 16

SketchAgent: Language-Driven Sequential Sketch Generation

Paper • 2411.17673 • Published Nov 26, 2024 • 19

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 38

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published Sep 6, 2024 • 27

CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases

Paper • 2408.03910 • Published Aug 7, 2024 • 18

An Empirical Study on LLM-based Agents for Automated Bug Fixing

Paper • 2411.10213 • Published Nov 15, 2024 • 1

Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model

Paper • 2406.15275 • Published Jun 21, 2024 • 12

General-Purpose In-Context Learning by Meta-Learning Transformers

Paper • 2212.04458 • Published Dec 8, 2022 • 1

Scattered Forest Search: Smarter Code Space Exploration with LLMs

Paper • 2411.05010 • Published Oct 22, 2024 • 1

What's New in My Data? Novelty Exploration via Contrastive Generation

Paper • 2410.14765 • Published Oct 18, 2024 • 1

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Paper • 2406.20086 • Published Jun 28, 2024 • 6

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Paper • 2410.21272 • Published Oct 28, 2024 • 2

Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS

Paper • 2411.18478 • Published Nov 27, 2024 • 38

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 88

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 47

Retrofitting (Large) Language Models with Dynamic Tokenization

Paper • 2411.18553 • Published Nov 27, 2024 • 2

Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13, 2024 • 5

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Paper • 2305.11554 • Published May 19, 2023 • 2

Pandora: Towards General World Model with Natural Language Actions and Video States

Paper • 2406.09455 • Published Jun 12, 2024 • 15

Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

Paper • 2309.08708 • Published Sep 15, 2023 • 3

From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation

Paper • 2304.08953 • Published Apr 18, 2023 • 2

Parameter-Efficient Tuning with Special Token Adaptation

Paper • 2210.04382 • Published Oct 10, 2022 • 1

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

Paper • 2305.14571 • Published May 23, 2023 • 1

Multi-Word Tokenization for Sequence Compression

Paper • 2402.09949 • Published Feb 15, 2024

OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining

Paper • 2311.08849 • Published Nov 15, 2023 • 5

Note Think of your tool as a "new language", then the embedding of [CLS] token at the end of description text can be used to initialize the embedding of the new thingy -- this could also stabilize the training process I suppose.

Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models

Paper • 2403.00417 • Published Mar 1, 2024 • 2

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published Oct 30, 2024 • 24

Note Unfortunately not impressive. Replacing linear layers with linear layers and call them attention between input x token is but a paraphrase. The 'scaling' idea is then just extending old ideas from Net2Net.

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4, 2024 • 30

Continuous Speech Synthesis using per-token Latent Diffusion

Paper • 2410.16048 • Published Oct 21, 2024 • 30

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Paper • 2411.19146 • Published Nov 28, 2024 • 18

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Paper • 2411.19527 • Published Nov 29, 2024 • 10

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14, 2024 • 28

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 50

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Paper • 2411.14257 • Published Nov 21, 2024 • 13

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Paper • 2411.14199 • Published Nov 21, 2024 • 32

Top-nσ: Not All Logits Are You Need

Paper • 2411.07641 • Published Nov 12, 2024 • 23

Analyzing The Language of Visual Tokens

Paper • 2411.05001 • Published Nov 7, 2024 • 25

VisualLens: Personalization through Visual History

Paper • 2411.16034 • Published Nov 25, 2024 • 18

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published Nov 29, 2024 • 64

Semantics and Spatiality of Emergent Communication

Paper • 2411.10173 • Published Nov 15, 2024 • 1

Searching Latent Program Spaces

Paper • 2411.08706 • Published Nov 13, 2024 • 2

Combining Induction and Transduction for Abstract Reasoning

Paper • 2411.02272 • Published Nov 4, 2024 • 1

Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows

Paper • 2406.16218 • Published Jun 23, 2024 • 2

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

Paper • 2411.14762 • Published Nov 22, 2024 • 11

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published Dec 9, 2024 • 85

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Paper • 2412.04445 • Published Dec 5, 2024 • 23

APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 39

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 66

Video Token Merging for Long-form Video Understanding

Paper • 2410.23782 • Published Oct 31, 2024 • 2

CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models

Paper • 2412.07393 • Published Dec 10, 2024 • 2

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Paper • 2412.08542 • Published Dec 11, 2024 • 1

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment

Paper • 2411.10606 • Published Nov 15, 2024 • 1

Human Expertise in Algorithmic Prediction

Paper • 2402.00793 • Published Feb 1, 2024 • 1

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 29

Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

Paper • 2412.13171 • Published Dec 17, 2024 • 36

Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Paper • 2412.12276 • Published Dec 16, 2024 • 15

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Paper • 2412.13194 • Published Dec 17, 2024 • 12

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 97

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Paper • 2412.04432 • Published Dec 5, 2024 • 16

Training Software Engineering Agents and Verifiers with SWE-Gym

Paper • 2412.21139 • Published Dec 30, 2024 • 23

LTX-Video: Realtime Video Latent Diffusion

Paper • 2501.00103 • Published Dec 30, 2024 • 47

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 277

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8 • 92

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Paper • 2501.10893 • Published Jan 18 • 26

CoS: Chain-of-Shot Prompting for Long Video Understanding

Paper • 2502.06428 • Published Feb 10 • 10

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28 • 31

Note This paper enhances token embeddings by incorporating local context awareness through n-gram windows, demonstrating that model performance improves with more diverse contextual patterns (controlled by parameter m). It's not 'increasing token vocabulary'.