21 27 55

Tony Wu

tonywu71

AI & ML interests

LLM, Multimodal, Agents, Information Retrieval, RAG, Speech

Recent Activity

authored a paper 3 days ago

Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

liked a model 4 days ago

Hcompany/Holo1-3B

upvoted a collection 4 days ago

Holo1

View all activity

Organizations

tonywu71's activity

upvoted a collection 4 days ago

Holo1

Collection

Vision-Language Action Model for use in Surfer-H web navigation agent • 5 items • Updated 3 days ago • 39

upvoted an article 17 days ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

and 6 others •

18 days ago

• 140

upvoted an article 19 days ago

Article

Preference Optimization for Vision Language Models

and 3 others •

Jul 10, 2024

• 76

upvoted an article 26 days ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

27 days ago

• 420

upvoted an article about 2 months ago

Article

Gotchas in Tokenizer Behavior Every Developer Should Know

•

Apr 18

• 37

upvoted a paper about 2 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 268

upvoted a paper 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 188

upvoted an article 2 months ago

Article

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

and 2 others •

Mar 18

• 10

upvoted an article 3 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

•

Jul 29, 2024

• 329

upvoted a paper 4 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 144

upvoted 2 articles 4 months ago

Article

SigLIP 2: A better multilingual vision language encoder

and 2 others •

Feb 21

• 165

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

and 2 others •

Feb 19

• 70

upvoted a paper 4 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 232

upvoted 2 articles 4 months ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

and 3 others •

Feb 4

• 158

Article

Open-source DeepResearch – Freeing our search agents

and 4 others •

Feb 4

• 1.25k

upvoted an article 5 months ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

and 2 others •

Jan 23

• 180

upvoted a paper 5 months ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 146

upvoted a paper 6 months ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 150

upvoted an article 7 months ago

Article

Visually Multilingual: Introducing mcdse-2b

•

Oct 27, 2024

• 41

upvoted a collection 8 months ago

Leaderboards and benchmarks ✨

Collection

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 91 items • Updated Feb 28 • 108