Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 12 days ago • 61
Adaptive Orchestration for Large-Scale Inference on Heterogeneous Accelerator Systems Balancing Cost, Performance, and Resilience Paper • 2503.20074 • Published 29 days ago • 5
view article Article NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets Mar 18 • 35
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 398
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 9 items • Updated 7 days ago • 117
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 64
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 72