Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
Abstract
Omni-Effects is a unified framework that enables the generation of prompt-guided and spatially controllable composite visual effects using LoRA-based Mixture of Experts and Spatial-Aware Prompt with Independent-Information Flow.
Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose Omni-Effects, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) LoRA-based Mixture of Experts (LoRA-MoE), which employs a group of expert LoRAs, integrating diverse effects within a unified model while effectively mitigating cross-task interference. (2) Spatial-Aware Prompt (SAP) incorporates spatial mask information into the text token, enabling precise spatial control. Furthermore, we introduce an Independent-Information Flow (IIF) module integrated within the SAP, isolating the control signals corresponding to individual effects to prevent any unwanted blending. To facilitate this research, we construct a comprehensive VFX dataset Omni-VFX via a novel data collection pipeline combining image editing and First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX evaluation framework for validating model performance. Extensive experiments demonstrate that Omni-Effects achieves precise spatial control and diverse effect generation, enabling users to specify both the category and location of desired effects.
Community
Very cool, amazing!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation (2025)
- FramePrompt: In-context Controllable Animation with Zero Structural Changes (2025)
- FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers (2025)
- MENTOR: Efficient Multimodal-Conditioned Tuning for Autoregressive Vision Generation Models (2025)
- Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition (2025)
- DivControl: Knowledge Diversion for Controllable Image Generation (2025)
- Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
See the demo
Models citing this paper 1
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper