Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published Jun 23 • 33
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding Paper • 2506.05551 • Published Jun 5 • 5
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 28 days ago • 606
Rope to Nope and Back Again: A New Hybrid Attention Strategy Paper • 2501.18795 • Published Jan 30 • 6
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 23
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models Paper • 2403.12881 • Published Mar 19, 2024 • 18
AgentTuning: Enabling Generalized Agent Abilities for LLMs Paper • 2310.12823 • Published Oct 19, 2023 • 36
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach Paper • 2410.03160 • Published Oct 4, 2024 • 5
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 161
ReLU^2 Wins: Discovering Efficient Activation Functions for Sparse LLMs Paper • 2402.03804 • Published Feb 6, 2024 • 4
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 45
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models Paper • 2402.13516 • Published Feb 21, 2024 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7, 2024 • 22
Neural Machine Translation by Jointly Learning to Align and Translate Paper • 1409.0473 • Published Sep 1, 2014 • 7
Effective Approaches to Attention-based Neural Machine Translation Paper • 1508.04025 • Published Aug 17, 2015 • 3