Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient Paper • 2411.17787 • Published Nov 26, 2024 • 12
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation Paper • 2312.13108 • Published Dec 20, 2023 • 3
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 32
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning Paper • 2203.10244 • Published Mar 19, 2022
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models Paper • 2411.00492 • Published Nov 1, 2024 • 6
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision Paper • 2309.14181 • Published Sep 25, 2023 • 2
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach Paper • 2305.12726 • Published May 22, 2023
Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models Paper • 2312.15300 • Published Dec 23, 2023 • 2
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels Paper • 2312.17090 • Published Dec 28, 2023 • 4
A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs Paper • 2402.07116 • Published Feb 11, 2024 • 2
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives Paper • 2211.04894 • Published Nov 9, 2022
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning Paper • 2408.07931 • Published Aug 15, 2024 • 21
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising Paper • 2406.06911 • Published Jun 11, 2024 • 11
GFlow: Recovering 4D World from Monocular Video Paper • 2405.18426 • Published May 28, 2024 • 15
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors Paper • 2312.13324 • Published Dec 20, 2023 • 11
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors Paper • 2312.13324 • Published Dec 20, 2023 • 11
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 28