Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2507.01006

GLM-4.1V-Thinking

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157
THUDM/GLM-4.1V-9B-Thinking

Image-Text-to-Text • 10B • Updated about 22 hours ago • 1.96k • 171
THUDM/GLM-4.1V-9B-Base

Image-Text-to-Text • 10B • Updated about 22 hours ago • 64 • 20
Running

17

17

GLM-4.1V-9B-Thinking-API-Demo

🚀

THUDM/GLM-4.1V-9B-Thinking Demo

about 22 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

MiMo-VL Technical Report

Paper • 2506.03569 • Published about 1 month ago • 73
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published 9 days ago • 57
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

Paper • 2506.19852 • Published 10 days ago • 31

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 10
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 29
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018 • 1

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

about 2 hours ago

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published 4 days ago • 16
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 35
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 78
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Paper • 2505.10046 • Published May 15 • 9
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published 15 days ago • 84
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published Apr 10 • 29
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability

Paper • 2504.08003 • Published Apr 9 • 49
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 29
Towards Learning to Complete Anything in Lidar

Paper • 2504.12264 • Published Apr 16 • 10

GLM-4.1V-Thinking

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157
THUDM/GLM-4.1V-9B-Thinking

Image-Text-to-Text • 10B • Updated about 22 hours ago • 1.96k • 171
THUDM/GLM-4.1V-9B-Base

Image-Text-to-Text • 10B • Updated about 22 hours ago • 64 • 20
Running

17

17

GLM-4.1V-9B-Thinking-API-Demo

🚀

THUDM/GLM-4.1V-9B-Thinking Demo

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018 • 1

about 22 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17 • 8
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4 • 22
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 40

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 5

about 2 hours ago

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published 4 days ago • 16
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 35
Time Blindness: Why Video-Language Models Can't See What Humans Can?

Paper • 2505.24867 • Published May 30 • 78
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157

MiMo-VL Technical Report

Paper • 2506.03569 • Published about 1 month ago • 73
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published 9 days ago • 57
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

Paper • 2506.19852 • Published 10 days ago • 31

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Paper • 2505.10046 • Published May 15 • 9
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published 15 days ago • 84
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published 3 days ago • 157

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 10
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 29
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published Apr 10 • 29
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability

Paper • 2504.08003 • Published Apr 9 • 49
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 29
Towards Learning to Complete Anything in Lidar

Paper • 2504.12264 • Published Apr 16 • 10

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs