3 22 22

Andrea Gemelli

andreagemelli

https://www.andreagemelli.me

AI & ML interests

Natural Language Processing, Computer Vision, Generative Models, Document Analysis

Recent Activity

upvoted a paper 1 day ago

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

upvoted a collection about 2 months ago

Qwen3

liked a model 3 months ago

letxbe/qwen2-7b-BoundingDocs-rephrased

View all activity

Organizations

upvoted a paper 1 day ago

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Paper • 2507.01955 • Published 6 days ago • 22

upvoted a collection about 2 months ago

Qwen3

Collection

72 items • Updated 23 days ago • 834

upvoted a paper 3 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 192

upvoted an article 3 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

and 6 others •

Feb 20

• 280

upvoted an article 4 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

and 3 others •

Mar 12

• 440

upvoted 3 collections 4 months ago

upvoted 2 articles 5 months ago

Article

SmolLM - blazingly fast and remarkably powerful

and 2 others •

Jul 16, 2024

• 387

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

and 2 others •

Jan 23

• 181

upvoted a paper 5 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 235

upvoted 2 articles 5 months ago

Article

SmolVLM - small yet mighty Vision Language Model

and 4 others •

Nov 26, 2024

• 329

Article

Open-R1: a fully open reproduction of DeepSeek-R1

and 2 others •

Jan 28

• 872

upvoted 2 papers 6 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 405

BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Paper • 2501.03403 • Published Jan 6 • 4

upvoted 2 papers 7 months ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 150

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 132

upvoted an article 8 months ago

Article

Releasing the largest multilingual open pretraining dataset

and 2 others •

Nov 13, 2024

• 102

upvoted a paper 10 months ago

One missing piece in Vision and Language: A Survey on Comics Understanding

Paper • 2409.09502 • Published Sep 14, 2024 • 26

upvoted an article about 1 year ago

Article

Let's talk about LLM evaluation

•

May 23, 2024

• 178

Andrea Gemelli

AI & ML interests

Recent Activity

Organizations

andreagemelli's activity

SmolVLM2: Bringing Video Understanding to Every Device

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

SmolLM - blazingly fast and remarkably powerful

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

SmolVLM - small yet mighty Vision Language Model

Open-R1: a fully open reproduction of DeepSeek-R1

Releasing the largest multilingual open pretraining dataset

Let's talk about LLM evaluation