Submitted by xianbao 138 GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models · 171 authors 2.06k 3
Submitted by RyanL22 49 Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off · 2 authors 119 3
Submitted by SiriusL 24 InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization · 13 authors 19 2
Submitted by YerbaPage 18 Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal · 7 authors 5 3
Submitted by JorgeeGF 17 Hidden Dynamics of Massive Activations in Transformer Training · 5 authors 4
Submitted by MikolajZ 11 GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing · 4 authors 9 2
Submitted by hdong51 10 Adapting Vision-Language Models Without Labels: A Comprehensive Survey · 6 authors 19 2
Submitted by huxueyu 9 OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use · 29 authors 332 2
Submitted by fsk515 7 MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh · 9 authors 3
Submitted by KejiaRobust 6 MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs · 7 authors 2
Submitted by shijiezhou 6 VLM4D: Towards Spatiotemporal Awareness in Vision Language Models · 10 authors 2
Submitted by LianShuQuan 4 UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding · 7 authors 2
Submitted by thebluser 3 LightSwitch: Multi-view Relighting with Material-guided Diffusion · 3 authors 3