Cobra: Efficient Line Art COlorization with BRoAder References
Abstract
The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control. A comic page often involves diverse characters, objects, and backgrounds, which complicates the coloring process. Despite advancements in diffusion models for image generation, their application in line art colorization remains limited, facing challenges related to handling extensive reference images, time-consuming inference, and flexible control. We investigate the necessity of extensive contextual image guidance on the quality of line art colorization. To address these challenges, we introduce Cobra, an efficient and versatile method that supports color hints and utilizes over 200 reference images while maintaining low latency. Central to Cobra is a Causal Sparse DiT architecture, which leverages specially designed positional encodings, causal sparse attention, and Key-Value Cache to effectively manage long-context references and ensure color identity consistency. Results demonstrate that Cobra achieves accurate line art colorization through extensive contextual reference, significantly enhancing inference speed and interactivity, thereby meeting critical industrial demands. We release our codes and models on our project page: https://zhuang2002.github.io/Cobra/.
Community
Paper Link: https://arxiv.org/abs/2504.12240
Project Page: https://zhuang2002.github.io/Cobra/
Code: https://github.com/Zhuang2002/Cobra
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities (2025)
- VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control (2025)
- Image Referenced Sketch Colorization Based on Animation Creation Workflow (2025)
- MagicColor: Multi-Instance Sketch Colorization (2025)
- BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing (2025)
- Inversion-Free Video Style Transfer with Trajectory Reset Attention Control and Content-Style Bridging (2025)
- DP-Adapter: Dual-Pathway Adapter for Boosting Fidelity and Text Consistency in Customizable Human Image Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper