GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published 12 days ago • 150
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9 • 45
Play to Generalize: Learning to Reason Through Game Play Paper • 2506.08011 • Published Jun 9 • 15
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 130
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 30 days ago • 528
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22 • 123
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 416
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper • 2501.12909 • Published Jan 22 • 72
Diffusion Adversarial Post-Training for One-Step Video Generation Paper • 2501.08316 • Published Jan 14 • 36
VideoAuteur: Towards Long Narrative Video Generation Paper • 2501.06173 • Published Jan 10 • 34
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 54
ViTamin Family Collection Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'. • 16 items • Updated Apr 11, 2024 • 8
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter Paper • 2402.10896 • Published Feb 16, 2024 • 16