arxiv:2507.13546

nablaNABLA: Neighborhood Adaptive Block-Level Attention

Published on Jul 17

· Submitted by

korviakov on Jul 25

#2 Paper of the day

Upvote

110

Authors:

Dmitrii Mikhailov ,

Aleksey Letunovskiy ,

Vladimir Korviakov ,

Denis Dimitrov

Abstract

NABLA, a dynamic block-level attention mechanism, improves video diffusion transformers by enhancing computational efficiency without sacrificing generative quality.

AI-generated summary

Recent progress in transformer-based architectures has demonstrated remarkable success in video generation tasks. However, the quadratic complexity of full attention mechanisms remains a critical bottleneck, particularly for high-resolution and long-duration video sequences. In this paper, we propose NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality. Our method does not require custom low-level operator design and can be seamlessly integrated with PyTorch's Flex Attention operator. Experiments demonstrate that NABLA achieves up to 2.7x faster training and inference compared to baseline almost without compromising quantitative metrics (CLIP score, VBench score, human evaluation score) and visual quality drop. The code and model weights are available here: https://github.com/gen-ai-team/Wan2.1-NABLA

View arXiv page View PDF GitHub 12 Add to collection

Community

korviakov

Paper author Paper submitter 4 days ago

This paper proposes NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality.

Github: https://github.com/gen-ai-team/Wan2.1-NABLA