Om AI Lab

Team

company

https://github.com/om-ai-lab

OmAI_lab

om-ai-lab

Activity Feed

AI & ML interests

Multimodal AI, VLM, VLA, VAM, etc

Recent Activity

tianchez authored a paper 19 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

tianchez authored a paper 19 days ago

Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

tianchez authored a paper 19 days ago

ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG

View all activity

Papers

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

View all Papers

Articles

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

20 days ago

• 12

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

21 days ago

• 14

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

22 days ago

• 15

Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning

Mar 25, 2025

• 3

Improving Object Detection through Reinforcement Learning with VLM-R1

Mar 25, 2025

• 4

View all articles

tianchez

authored 7 papers 19 days ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10, 2025 • 36

Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types

Paper • 2412.17867 • Published Dec 21, 2024 • 2

ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG

Paper • 2411.07688 • Published Nov 12, 2024

VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

Paper • 2509.25916 • Published Sep 30, 2025 • 6

GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing

Paper • 2503.12490 • Published Mar 16, 2025

MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

Paper • 2603.12266 • Published Mar 12 • 19

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published May 27 • 25

P3ngLiu

published an article 20 days ago

Article

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

omlab

•

20 days ago

• 12

P3ngLiu

published an article 21 days ago

Article

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

omlab

•

21 days ago

• 14

tianchez

published an article 22 days ago

Article

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

omlab

•

22 days ago

• 15

Heting

updated a model about 1 month ago

omlab/OmTrackVLA-0.6B

Other • 0.6B • Updated about 1 month ago • 161 • 5

P3ngLiu

updated a collection about 1 month ago

OmDet-Turbo-Models

Collection

A collection of OmDet-Turbo Models. • 1 item • Updated about 1 month ago

P3ngLiu

updated a model about 1 month ago

omlab/VLM-FO1-3B-v01

Object Detection • 4B • Updated about 1 month ago • 205 • 17

tianchez

submitted a paper to Daily Papers about 2 months ago

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published May 27 • 25

kyusonglee

updated a model 3 months ago

omlab/OmTrackVLA-0.6B

Other • 0.6B • Updated about 1 month ago • 161 • 5

Zilun

updated a dataset 5 months ago

omlab/SARDet_REC6_NORM-FS

Viewer • Updated Feb 4 • 968 • 33

Zilun

published a dataset 5 months ago

omlab/SARDet_REC6_NORM-FS

Viewer • Updated Feb 4 • 968 • 33

Zilun

updated a dataset 5 months ago

omlab/SARDet_REC6-FS

Viewer • Updated Feb 4 • 968 • 18

Zilun

published a dataset 6 months ago

omlab/SARDet_REC6-FS

Viewer • Updated Feb 4 • 968 • 18

Zilun

updated a dataset 6 months ago

omlab/SARDet3-FS

Viewer • Updated Feb 1 • 270 • 16

AI & ML interests

Recent Activity

Papers

Articles

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

Trials, Errors, and Breakthroughs: Our Rocky Road to OVD SOTA with Reinforcement Learning

Improving Object Detection through Reinforcement Learning with VLM-R1

Team members 3

omlab's activity

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction