-
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157 -
THUDM/GLM-4.1V-9B-Thinking
Image-Text-to-Text • 10B • Updated • 1.96k • 171 -
THUDM/GLM-4.1V-9B-Base
Image-Text-to-Text • 10B • Updated • 64 • 20 -
17
GLM-4.1V-9B-Thinking-API-Demo
🚀THUDM/GLM-4.1V-9B-Thinking Demo
Collections
Discover the best community collections!
Collections including paper arxiv:2507.01006
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 73 -
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 57 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157 -
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation
Paper • 2506.19852 • Published • 31
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 29 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 8 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 16 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 35 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 78 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157
-
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Paper • 2505.10046 • Published • 9 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 84 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 29 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 29 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 10
-
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157 -
THUDM/GLM-4.1V-9B-Thinking
Image-Text-to-Text • 10B • Updated • 1.96k • 171 -
THUDM/GLM-4.1V-9B-Base
Image-Text-to-Text • 10B • Updated • 64 • 20 -
17
GLM-4.1V-9B-Thinking-API-Demo
🚀THUDM/GLM-4.1V-9B-Thinking Demo
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Paper • 2502.11573 • Published • 8 -
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Paper • 2502.02339 • Published • 22 -
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper • 2502.11775 • Published • 9 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 16 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 35 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 78 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157
-
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 73 -
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 57 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157 -
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation
Paper • 2506.19852 • Published • 31
-
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Paper • 2505.10046 • Published • 9 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 84 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 157
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 29 -
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Paper • 2503.23157 • Published • 10 -
AI Agents: Evolution, Architecture, and Real-World Applications
Paper • 2503.12687 • Published • 2
-
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models
Paper • 2504.07951 • Published • 29 -
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Paper • 2504.08003 • Published • 49 -
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Paper • 2504.11468 • Published • 29 -
Towards Learning to Complete Anything in Lidar
Paper • 2504.12264 • Published • 10