-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 11 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 4 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 22
Inui
Norm
AI & ML interests
Video Diffusion; Large Language Model; Object Detection; OCR
Recent Activity
upvoted
a
paper
about 6 hours ago
UniWorld: High-Resolution Semantic Encoders for Unified Visual
Understanding and Generation
upvoted
a
paper
5 days ago
One-shot Entropy Minimization
upvoted
a
paper
13 days ago
MMaDA: Multimodal Large Diffusion Language Models
Organizations
Collections
9
Papers
1
models
2
datasets
0
None public yet