2 4 4

Pavan Kumar Vasu

pavankumarvasu

pavank-apple

AI & ML interests

None yet

Recent Activity

liked a Space 1 day ago

webml-community/fastvlm-webgpu

upvoted a paper about 2 months ago

Visual Planning: Let's Think Only with Images

commented on a paper about 2 months ago

FastVLM: Efficient Vision Encoding for Vision Language Models

View all activity

Organizations

liked a Space 1 day ago

FastVLM WebGPU

🍎

Real-time video captioning powered by FastVLM

upvoted a paper about 2 months ago

Visual Planning: Let's Think Only with Images

Paper • 2505.11409 • Published May 16 • 56

commented a paper about 2 months ago

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 21 •

authored a paper 6 months ago

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 21

New activity in apple/MobileCLIP-S2-OpenCLIP 7 months ago

The inference speed of MobileCLIP-S2's image encoder is slower than OpenCLIP's ViT-B-32-256 model on both CPU and GPU

🔥 1

#3 opened 10 months ago by

Kinfai

updated a model 8 months ago

apple/coreml-mobileclip

Updated Nov 19, 2024 • 482 • 43

liked a model 8 months ago

VectorStackAI/vstackai-law-1

Updated Dec 9, 2024 • 14 • 4

liked a dataset 11 months ago

BAAI/DataOptim

Updated Mar 14, 2024 • 318 • 20

upvoted an article 11 months ago

Article

MobileNet Baselines

•

Jul 26, 2024

• 24

authored 7 papers 11 months ago

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Paper • 2303.14189 • Published Mar 24, 2023 • 4

MobileOne: An Improved One millisecond Mobile Backbone

Paper • 2206.04040 • Published Jun 8, 2022

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Paper • 2310.15308 • Published Oct 23, 2023 • 23

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 2

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Paper • 2407.06723 • Published Jul 9, 2024 • 11

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

Paper • 2405.13226 • Published May 21, 2024 • 1

CLIP with Quality Captions: A Strong Pretraining for Vision Tasks

Paper • 2405.08911 • Published May 14, 2024 • 1

upvoted an article 12 months ago

Article

WWDC 24: Running Mistral 7B with Core ML

and 3 others •

Jul 22, 2024

• 61

upvoted a collection about 1 year ago

MobileCLIP Models + DataCompDR Data

Collection

MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 29

liked a Space about 1 year ago

WebGPU MobileCLIP

⚡

Classify images in real-time using labels

Pavan Kumar Vasu

AI & ML interests

Recent Activity

Organizations

pavankumarvasu's activity

FastVLM WebGPU

The inference speed of MobileCLIP-S2's image encoder is slower than OpenCLIP's ViT-B-32-256 model on both CPU and GPU

MobileNet Baselines

WWDC 24: Running Mistral 7B with Core ML

WebGPU MobileCLIP