CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24 • 26
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 46
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 28
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published May 23 • 17
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published May 23 • 6
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published May 28 • 12
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Paper • 2406.00392 • Published Jun 1 • 12
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published Jun 4 • 37
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5 • 8
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published Jun 5 • 11
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published Jun 6 • 20
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7 • 27
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Paper • 2406.04391 • Published Jun 6 • 7
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach Paper • 2406.04594 • Published Jun 7 • 5
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6 • 56
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12 • 23
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published Jun 13 • 18
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding Paper • 2406.09297 • Published Jun 13 • 4
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression Paper • 2406.11430 • Published Jun 17 • 22
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published Jun 20 • 32
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published Jun 20 • 29
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published Jun 18 • 36
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1 • 75
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29 • 34
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Paper • 2407.01791 • Published Jul 1 • 5
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models Paper • 2407.02687 • Published Jul 2 • 22
Learning to (Learn at Test Time): RNNs with Expressive Hidden States Paper • 2407.04620 • Published Jul 5 • 27
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper • 2407.06135 • Published Jul 8 • 20
An accurate detection is not all you need to combat label noise in web-noisy datasets Paper • 2407.05528 • Published Jul 8 • 3
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging Paper • 2407.07315 • Published Jul 10 • 6
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning Paper • 2407.07523 • Published Jul 10 • 4
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients Paper • 2407.11239 • Published Jul 15 • 7
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper • 2407.11062 • Published Jul 10 • 8
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18 • 33
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22 • 19
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19 • 17
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines Paper • 2408.01050 • Published Aug 2 • 8
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling Paper • 2408.04810 • Published Aug 9 • 22
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12 • 16
Heavy Labels Out! Dataset Distillation with Label Space Lightening Paper • 2408.08201 • Published Aug 15 • 18
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23 • 25
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published Aug 26 • 35
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28 • 42
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published Sep 10 • 14
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published Sep 16 • 39
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 47
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19 • 24
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24 • 12
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published Sep 25 • 27
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 144
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond Paper • 2410.02362 • Published Oct 3 • 17
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4 • 4
MLP-KAN: Unifying Deep Representation and Function Learning Paper • 2410.03027 • Published Oct 3 • 29
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published Oct 7 • 44
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning Paper • 2410.06373 • Published Oct 8 • 35
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Paper • 2410.07170 • Published Oct 9 • 15
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published Oct 3 • 47
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28 • 19
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs Paper • 2410.05265 • Published Oct 7 • 29
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13 • 54
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14 • 48
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17 • 74
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning Paper • 2410.13618 • Published Oct 17 • 6
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published Oct 17 • 25
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22 • 45
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22 • 89
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Paper • 2410.19168 • Published Oct 24 • 19
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25 • 19
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Paper • 2410.21264 • Published Oct 28 • 9
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction Paper • 2410.18481 • Published Oct 24 • 5
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA Paper • 2410.20672 • Published Oct 28 • 6
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23 • 200
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Paper • 2410.21465 • Published Oct 28 • 11
Accelerating Direct Preference Optimization with Prefix Sharing Paper • 2410.20305 • Published Oct 27 • 6
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos Paper • 2410.23287 • Published Oct 30 • 19
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents Paper • 2410.22476 • Published Oct 29 • 25
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published Oct 28 • 16
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation Paper • 2410.21157 • Published Oct 28 • 6
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4 • 46
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4 • 33
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published Nov 4 • 11
Controlling Language and Diffusion Models by Transporting Activations Paper • 2410.23054 • Published Oct 30 • 16
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7 • 37
Can sparse autoencoders be used to decompose and interpret steering vectors? Paper • 2411.08790 • Published Nov 13 • 8
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18 • 19
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers Paper • 2411.10510 • Published Nov 15 • 8
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published Nov 17 • 50
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published Nov 20 • 15
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21 • 43
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published Nov 21 • 58
Cautious Optimizers: Improving Training with One Line of Code Paper • 2411.16085 • Published about 1 month ago • 15
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 30 days ago • 47
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Paper • 2411.18499 • Published 28 days ago • 18
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning Paper • 2412.00568 • Published 25 days ago • 14
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper • 2411.17459 • Published 30 days ago • 10
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published 15 days ago • 20
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer Paper • 2412.07720 • Published 15 days ago • 30
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Paper • 2412.06071 • Published 17 days ago • 7
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Paper • 2412.10208 • Published 13 days ago • 19
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 75
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 10 days ago • 41
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 3 days ago • 28
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 2 days ago • 21