Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper โข 2412.04432 โข Published 20 days ago โข 14
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper โข 2412.04445 โข Published 20 days ago โข 21
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Paper โข 2409.04410 โข Published Sep 6 โข 23
SEED-Story: Multimodal Long Story Generation with Large Language Model Paper โข 2407.08683 โข Published Jul 11 โข 22
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper โข 2406.12275 โข Published Jun 18 โข 29
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper โข 2405.07990 โข Published May 13 โข 16
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper โข 2404.16790 โข Published Apr 25 โข 7
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper โข 2404.14396 โข Published Apr 22 โข 18
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper โข 2404.14396 โข Published Apr 22 โข 18