Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details Paper • 2506.16504 • Published 27 days ago • 23
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published 28 days ago • 37
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 108
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World? Paper • 2506.05287 • Published Jun 5 • 15
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Paper • 2505.14640 • Published May 20 • 15
SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations Paper • 2505.02094 • Published May 4 • 19
view article Article Efficient MoE Align & Sort design in SGLang Fused MoE By yiakwy-xpu-team • Mar 25 • 3
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Paper • 2504.01943 • Published Apr 2 • 15
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate By muellerzr and 3 others • Jun 13, 2024 • 55
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published Feb 19 • 28
VideoLLaMA3 Collection Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated 27 days ago • 14
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 91
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 295
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 48
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 107