Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 150
A Simple Aerial Detection Baseline of Multimodal Language Models Paper • 2501.09720 • Published Jan 16 • 1
PointOBB: Learning Oriented Object Detection via Single Point Supervision Paper • 2311.14757 • Published Nov 23, 2023
H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection Paper • 2304.04403 • Published Apr 10, 2023
ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection Paper • 2303.04989 • Published Mar 9, 2023
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision Paper • 2311.14758 • Published Nov 23, 2023
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World Paper • 2402.19474 • Published Feb 29, 2024 • 2
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12, 2024 • 29
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World Paper • 2308.01907 • Published Aug 3, 2023 • 12
InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language Paper • 2305.05662 • Published May 9, 2023 • 4