Gen AI Diffusion

Stalin16 's Collections

Edu

Agents

T2I Models

Model Evaluation

Reasoning Models

Data and other things

updated May 20

Upvote

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14, 2024 • 57
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7, 2024 • 71
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Paper • 2411.04709 • Published Nov 5, 2024 • 27
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9, 2024 • 43
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

Paper • 2410.06244 • Published Oct 8, 2024 • 20
How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published Nov 4, 2024 • 34
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Paper • 2411.02394 • Published Nov 4, 2024 • 16
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published Oct 28, 2024 • 84
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Paper • 2410.10812 • Published Oct 14, 2024 • 18
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Paper • 2410.13830 • Published Oct 17, 2024 • 26
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Paper • 2411.05007 • Published Nov 7, 2024 • 25
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Paper • 2411.07232 • Published Nov 11, 2024 • 67
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Paper • 2411.07199 • Published Nov 11, 2024 • 50
MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published Nov 14, 2024 • 80
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16, 2024 • 24
Stylecodes: Encoding Stylistic Information For Image Generation

Paper • 2411.12811 • Published Nov 19, 2024 • 12
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

Paper • 2411.15115 • Published Nov 22, 2024 • 10
Style-Friendly SNR Sampler for Style-Driven Generation

Paper • 2411.14793 • Published Nov 22, 2024 • 40
OminiControl: Minimal and Universal Control for Diffusion Transformer

Paper • 2411.15098 • Published Nov 22, 2024 • 61
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published Dec 2, 2024 • 13
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 114
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training

Paper • 2412.02030 • Published Dec 2, 2024 • 19
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

Paper • 2412.05355 • Published Dec 6, 2024 • 8
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published Dec 5, 2024 • 17
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 48
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published Dec 10, 2024 • 20
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Paper • 2412.07774 • Published Dec 10, 2024 • 30
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 12
ObjCtrl-2.5D: Training-free Object Control with Camera Poses

Paper • 2412.07721 • Published Dec 10, 2024 • 9
StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published Dec 10, 2024 • 20
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

Paper • 2412.08629 • Published Dec 11, 2024 • 13
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Paper • 2412.09349 • Published Dec 12, 2024 • 8
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Paper • 2412.08580 • Published Dec 11, 2024 • 45
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published Dec 12, 2024 • 21
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Paper • 2412.09626 • Published Dec 12, 2024 • 21
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 28
BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 37
1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 88
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Paper • 2412.19712 • Published Dec 27, 2024 • 15
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

Paper • 2412.19645 • Published Dec 27, 2024 • 13
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

Paper • 2501.01427 • Published Jan 2, 2025 • 52
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

Paper • 2501.01320 • Published Jan 2, 2025 • 14
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning

Paper • 2501.04698 • Published Jan 8, 2025 • 15
Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published Jan 14, 2025 • 36
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Paper • 2501.05131 • Published Jan 9, 2025 • 37
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

Paper • 2501.09503 • Published Jan 16, 2025 • 14
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16, 2025 • 72
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

Paper • 2501.18427 • Published Jan 30, 2025 • 27
MatAnyone: Stable Video Matting with Consistent Memory Propagation

Paper • 2501.14677 • Published Jan 24, 2025 • 34
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5, 2025 • 59
DynVFX: Augmenting Real Videos with Dynamic Content

Paper • 2502.03621 • Published Feb 5, 2025 • 29
Generating Multi-Image Synthetic Data for Text-to-Image Customization

Paper • 2502.01720 • Published Feb 3, 2025 • 8
Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published Feb 11, 2025 • 37
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3, 2025 • 225
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Paper • 2502.10458 • Published Feb 12, 2025 • 38
PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

Paper • 2502.14397 • Published Feb 20, 2025 • 41
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25, 2025 • 74
KV-Edit: Training-Free Image Editing for Precise Background Preservation

Paper • 2502.17363 • Published Feb 24, 2025 • 37
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Paper • 2502.18461 • Published Feb 25, 2025 • 17
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Paper • 2503.06698 • Published Mar 9, 2025 • 4
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

Paper • 2503.07027 • Published Mar 10, 2025 • 30
YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published Mar 11, 2025 • 73
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Paper • 2503.08677 • Published Mar 11, 2025 • 29
Personalize Anything for Free with Diffusion Transformer

Paper • 2503.12590 • Published Mar 16, 2025 • 44
Efficient Personalization of Quantized Diffusion Model without Backpropagation

Paper • 2503.14868 • Published Mar 19, 2025 • 20
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Paper • 2503.14478 • Published Mar 18, 2025 • 48
Video-T1: Test-Time Scaling for Video Generation

Paper • 2503.18942 • Published Mar 24, 2025 • 90
Judge Anything: MLLM as a Judge Across Any Modality

Paper • 2503.17489 • Published Mar 21, 2025 • 23
Training-free Diffusion Acceleration with Bottleneck Sampling

Paper • 2503.18940 • Published Mar 24, 2025 • 12
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Paper • 2503.20672 • Published Mar 26, 2025 • 14
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation

Paper • 2503.22194 • Published Mar 28, 2025 • 25
Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model

Paper • 2503.22622 • Published Mar 28, 2025 • 18
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model

Paper • 2503.21144 • Published Mar 27, 2025 • 27
Optimal Stepsize for Diffusion Sampling

Paper • 2503.21774 • Published Mar 27, 2025 • 13
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3, 2025 • 68
FreSca: Unveiling the Scaling Space in Diffusion Models

Paper • 2504.02154 • Published Apr 2, 2025 • 18
One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published Apr 7, 2025 • 110
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Paper • 2504.08736 • Published Apr 11, 2025 • 46
Alchemist: Turning Public Text-to-Image Data into Generative Gold

Paper • 2505.19297 • Published May 25, 2025 • 86
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Paper • 2505.18445 • Published May 24, 2025 • 63
Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models

Paper • 2506.00996 • Published Jun 1, 2025 • 40
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6, 2025 • 27
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published Jun 9, 2025 • 19
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11, 2025 • 55
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 109
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Paper • 2506.08009 • Published Jun 9, 2025 • 31
OmniGen2: Exploration to Advanced Multimodal Generation

Paper • 2506.18871 • Published Jun 23, 2025 • 79
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Paper • 2506.18903 • Published Jun 23, 2025 • 22
Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models

Paper • 2506.18900 • Published Jun 23, 2025 • 3
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Paper • 2506.17612 • Published Jun 21, 2025 • 65
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Paper • 2506.20452 • Published Jun 25, 2025 • 18
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published Jun 20, 2025 • 55
FairyGen: Storied Cartoon Video from a Single Child-Drawn Character

Paper • 2506.21272 • Published Jun 26, 2025 • 9
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation

Paper • 2506.18899 • Published Jun 23, 2025 • 6
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

Paper • 2506.19851 • Published Jun 24, 2025 • 60
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers

Paper • 2507.12956 • Published Jul 17, 2025 • 25
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Paper • 2508.03694 • Published Aug 5, 2025 • 53
Matrix-3D: Omnidirectional Explorable 3D World Generation

Paper • 2508.08086 • Published Aug 11, 2025 • 76
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Paper • 2508.07901 • Published Aug 11, 2025 • 40
Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Paper • 2508.09983 • Published Aug 13, 2025 • 70
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Paper • 2508.10881 • Published Aug 14, 2025 • 54
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Paper • 2512.10881 • Published Dec 11, 2025 • 32
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Paper • 2604.18168 • Published Apr 20 • 96

Upvote

Collection guide
Browse collections