2 18 7

hcwei

AI & ML interests

Diffusion Model, Image Generation, ML, DL, CV

Recent Activity

upvoted a paper 2 days ago

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

upvoted a paper 2 days ago

Training-Free Reasoning and Reflection in MLLMs

commented on a paper 2 days ago

Training-Free Reasoning and Reflection in MLLMs

View all activity

Organizations

None yet

hcwei's activity

upvoted 2 papers 2 days ago

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published 3 days ago • 47

Training-Free Reasoning and Reflection in MLLMs

Paper • 2505.16151 • Published 3 days ago • 7

upvoted 5 papers about 1 month ago

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17 • 21

Antidistillation Sampling

Paper • 2504.13146 • Published Apr 17 • 61

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Paper • 2504.13169 • Published Apr 17 • 39

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 49

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Paper • 2504.06514 • Published Apr 9 • 39

upvoted 2 papers 4 months ago

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published Jan 14 • 15

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 291

upvoted 2 papers 5 months ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published Dec 24, 2024 • 76

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 44

upvoted 2 papers 8 months ago

Visual Context Window Extension: A New Perspective for Long Video Understanding

Paper • 2409.20018 • Published Sep 30, 2024 • 11

E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding

Paper • 2409.18111 • Published Sep 26, 2024 • 7

upvoted a paper 10 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61

upvoted a collection 10 months ago

LMMs-Eval

Collection

Dataset Collection of LMMs-Eval • 36 items • Updated Oct 4, 2024 • 29

upvoted 2 papers 12 months ago

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10, 2024 • 30

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 76

upvoted a paper about 1 year ago

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 46