6 23 42

Haoqin Tu

PahaII

https://www.haqtu.me/

ImKeTT

AI & ML interests

generation, latent variable models

Recent Activity

updated a model 1 day ago

PahaII/maplillary_results

published a model 4 days ago

PahaII/maplillary_results

liked a model about 1 month ago

UCSC-VLAA/VLAA-Thinker-Qwen2.5VL-3B

View all activity

Organizations

upvoted an article about 2 months ago

Article

Vision Language Models Explained

and 1 other •

Apr 11, 2024

• 415

upvoted 2 papers 2 months ago

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

Paper • 2505.03981 • Published May 6 • 15

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Paper • 2505.04601 • Published May 7 • 27

upvoted 2 collections 2 months ago

VLAA-Thinker

Collection

6 items • Updated Apr 17 • 4

OpenVision

Collection

27 items • Updated May 8 • 29

upvoted a paper 3 months ago

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Paper • 2504.11468 • Published Apr 10 • 29

upvoted an article 3 months ago

Article

What is test-time compute and how to scale it?

and 1 other •

Feb 6

• 96

upvoted a paper 4 months ago

ViLBench: A Suite for Vision-Language Process Reward Modeling

Paper • 2503.20271 • Published Mar 26 • 7

upvoted a paper 9 months ago

VHELM: A Holistic Evaluation of Vision Language Models

Paper • 2410.07112 • Published Oct 9, 2024 • 3

upvoted a paper 10 months ago

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

Paper • 2409.15277 • Published Sep 23, 2024 • 39

upvoted 2 papers about 1 year ago

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5, 2024 • 57

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12, 2024 • 42

upvoted 8 papers over 1 year ago

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 33

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 33

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Paper • 2311.16101 • Published Nov 27, 2023 • 1

ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue

Paper • 2305.13602 • Published May 23, 2023 • 1

Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

Paper • 2310.13671 • Published Oct 20, 2023 • 19