-
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 59 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Paper • 2405.08344 • Published • 16
Yiming Wu
weleen
AI & ML interests
Computer Vision
Recent Activity
updated
a model
2 days ago
weleen/grab_bread_and_put
published
a model
2 days ago
weleen/grab_bread_and_put
updated
a dataset
4 days ago
weleen/take_the_banana_and_insert_into_the_bottle
Organizations
aigc acceleration
-
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 22 -
Phased Consistency Model
Paper • 2405.18407 • Published • 49 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 30
miscellaneous
gs
aigc
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Paper • 2311.10709 • Published • 26 -
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Paper • 2405.12970 • Published • 26 -
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper • 2405.11473 • Published • 58 -
stabilityai/stable-diffusion-3-medium
Text-to-Image • Updated • 13k • • 4.83k
aigc benchmark
-
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Paper • 2407.14505 • Published • 27 -
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Paper • 2407.15272 • Published • 10 -
A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data
Paper • 2407.16680 • Published • 12
datasets
foundation model
-
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 59 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding
Paper • 2405.08344 • Published • 16
aigc
-
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Paper • 2311.10709 • Published • 26 -
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Paper • 2405.12970 • Published • 26 -
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper • 2405.11473 • Published • 58 -
stabilityai/stable-diffusion-3-medium
Text-to-Image • Updated • 13k • • 4.83k
aigc acceleration
-
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 22 -
Phased Consistency Model
Paper • 2405.18407 • Published • 49 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 30
aigc benchmark
-
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Paper • 2407.14505 • Published • 27 -
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Paper • 2407.15272 • Published • 10 -
A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data
Paper • 2407.16680 • Published • 12
miscellaneous
datasets
gs