A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published 22 days ago • 233
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27, 2024 • 43
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning Paper • 2507.14137 • Published 21 days ago • 32
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published 29 days ago • 60
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 84
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 210
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models Paper • 2503.19355 • Published Mar 25 • 2
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Paper • 2506.07464 • Published Jun 9 • 13
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • Jul 9 • 639
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2 • 34
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 57
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published Apr 14 • 14
view article Article Bamba: Inference-Efficient Hybrid Mamba2 Model By rganti and 28 others • Dec 18, 2024 • 58
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings Paper • 2506.23115 • Published Jun 29 • 36
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26 • 28
Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? Paper • 2506.17417 • Published Jun 20 • 11
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 128
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Paper • 2403.19651 • Published Mar 28, 2024 • 23
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure Paper • 2506.12278 • Published Jun 13 • 17