Teacache & Wan 2.1 Integration Tutorial for SwarmUI

Community Article Published May 22, 2025

https://youtu.be/gFMUChHgXYk

YouTube Video

📋 Overview

This tutorial demonstrates how to use Teacache to significantly accelerate AI generation speeds in SwarmUI with ComfyUI backend. Learn how to properly configure and use Wan 2.1 Text-to-Image and Text-to-Video models with optimized presets for maximum performance.

🔗 Essential Resources

Download Links

Prerequisites Tutorials

Community Resources

⏱️ Tutorial Timeline

Time Topic
0:00 Introduction: Teacache & Wan 2.1 Presets for Swarm UI
0:35 Prerequisites: Previous Tutorials & Updating Swarm UI Files
1:09 Running the Swarm UI Update Script
1:21 Importing the New Presets into Swarm UI
1:46 Enabling Advanced Options & Locating Teacache Installer
1:57 Understanding Teacache: Faster Generation, Minimal Quality Loss
2:14 Monitoring Teacache Installation Process via CMD
2:32 Teacache Installed: Preparing for Image-to-Video Generation
2:43 Applying Image-to-Video Preset & Initial Configuration
3:04 Selecting Init Image & Base Model (e.g., Wan 2.1 480p)
3:25 How to Download Models via Swarm UI Downloader
3:52 Choosing Specific Image-to-Video Models (FP16/GGUF Q8)
4:04 Setting Correct Resolution & Aspect Ratio from Model Metadata
4:25 Key Image-to-Video Settings: Model Override & Video Frames
4:42 Optimizing Video Steps (30) & CFG (6) for Teacache
5:01 Configuring Teacache Mode (All) & Threshold (15%)
5:08 Setting Frame Interpolation (2x for 32 FPS) & Duration
5:22 Starting Image-to-Video: Importance of Latest Swarm UI
5:41 Generation Started: Teacache & Step Skipping Explained
6:05 Observing Teacache in Action: Step Jumps & How It Works
6:23 Leveraging Sage Attention & ComfyUI's Automated Setup
6:38 Teacache Performance Boost: Example Speed Increase (IT/s)
6:51 Understanding ComfyUI Block Swapping & Monitoring GPU Usage
7:18 Image-to-Video Generation Complete: Total Time & Output
7:32 Accessing Generated Video & Output Format Options (H.265)
7:55 Text-to-Video: Applying Preset & Adjusting Core Settings
8:13 Configuring Text-to-Video Parameters: Steps (30), FPS, Format
8:27 Selecting Text-to-Video Model (GGUF Q8) & Setting Resolution
8:45 Advanced Settings: UniPC Sampler, Sigma Shift (8), CFG Impact
9:03 Enabling Teacache (15%) for Text-to-Video
9:15 Starting HD Text-to-Video Generation (GGUF Q8 Model)
9:36 Understanding Performance: HD Resolution & Frame Count Impact
9:54 Text-to-Video Complete: Time Taken & Teacache Speedup
10:06 Downloading & Reviewing the Full HD Text-to-Video Result
10:19 Comparing Prompt Effectiveness: Image-to-Video vs. Text-to-Video
10:30 Conclusion: Future Presets & Power of Swarm UI with ComfyUI

TeaCache: Brewing Faster Inference for Diffusion Models

Diffusion models have revolutionized image, video, and audio generation, producing stunningly realistic and creative outputs. However, their iterative denoising process, often involving hundreds of steps, makes inference notoriously slow. Addressing this bottleneck, TeaCache (Timestep Embedding Aware Cache) emerges as an innovative, training-free approach to significantly accelerate these models without substantial degradation in output quality.


The Challenge: The Iterative Nature of Diffusion

At their core, diffusion models work by progressively removing noise from an initial random state over a series of “timesteps.” Each timestep involves a computationally intensive pass through a large neural network (often a U-Net or Transformer). The sheer number of these steps is the primary reason for long generation times, hindering rapid prototyping and real-time applications.


How TeaCache Works: The Secret Sauce

TeaCache’s brilliance lies in its observation that computations at adjacent timesteps, especially in the later stages of denoising, often produce highly similar intermediate results or “residuals” (the difference between the model’s output and its input). Instead of recomputing everything at every step, TeaCache intelligently decides when to reuse cached information.

The “Timestep Embedding Aware” part is crucial. Here’s a breakdown of its mechanism:

  • Timestep Embedding as a Proxy:
    Diffusion models use timestep embeddings — vector representations of the current denoising step — to guide the model’s behavior. TeaCache hypothesizes that the difference between consecutive timestep embeddings can serve as a good indicator of how much the model’s internal state (and thus its output) will change.

  • Predicting Similarity:
    At each denoising step, TeaCache compares the current timestep embedding with the one from the previously computed step.

  • Rescaling and Thresholding:
    This raw difference in embeddings is then rescaled using a model-specific polynomial function (defined by coefficients in the TeaCache implementation for various models). This rescaled difference represents an estimated “relative L1 distance” between the model’s potential outputs.

  • Caching Decision:
    This estimated distance is compared against a user-defined rel_l1_thresh (relative L1 threshold).

    • If the distance is below the threshold:
      It implies that the model’s output for the current step is likely to be very similar to the previous one. TeaCache then skips the full, expensive computation for the current step. Instead, it reuses the previously computed residual (the difference between the model’s output and its input from the last fully computed step) and applies it to the current step’s input.

    • If the distance is above the threshold (or if it’s the first/last step):
      TeaCache performs the full computation, updates its cache with the new residual, and resets its accumulated distance counter.

  • Accumulated Distance:
    The system keeps an accumulated_rel_l1_distance. If several consecutive steps are skipped, this accumulated distance grows. Once it surpasses the rel_l1_thresh, a full computation is triggered.

This adaptive caching strategy allows TeaCache to skip redundant computations while ensuring that the model performs a full update when significant changes are expected, thus maintaining quality.


Key Advantages of TeaCache

  • Training-Free:
    One of TeaCache’s most significant advantages is that it requires no model retraining or fine-tuning. It can be applied “on top” of existing pre-trained diffusion models.

  • Significant Speedup:
    As demonstrated in its repository, TeaCache can provide substantial inference speedups, often in the range of 1.5x to over 2x, depending on the model and the chosen threshold.

  • Broad Model Compatibility:
    While initially focused on video diffusion models, TeaCache has shown effectiveness across image and audio diffusion models as well.

  • User-Controllable Trade-off:
    The rel_l1_thresh parameter provides a direct way for users to balance inference speed against output quality. Higher thresholds lead to more aggressive caching and faster speeds but might introduce slight quality degradation.


Supported Models

TeaCache offers impressive versatility, with dedicated implementations and support for a growing list of popular diffusion models:

Text-to-Video (T2V):

  • Wan2.1
  • Cosmos
  • CogVideoX1.5
  • LTX-Video
  • Mochi
  • HunyuanVideo
  • CogVideoX
  • Open-Sora
  • Open-Sora-Plan
  • Latte
  • EasyAnimate (via community)
  • FramePack (via community)
  • FastVideo (via community)

Image-to-Video (I2V):

  • Wan2.1
  • Cosmos
  • CogVideoX1.5
  • ConsisID
  • EasyAnimate (via community)
  • Ruyi-Models (via community)

Video-to-Video (V2V):

  • EasyAnimate (via community)

Text-to-Image (T2I):

  • FLUX
  • Lumina-T2X

Text-to-Audio (T2A):

  • TangoFlux

🎓 About the Creator

Dr. Furkan Gözükara - Assistant Professor in Software Engineering

  • 🎓 PhD in Computer Engineering
  • 📺 37,000+ YouTube subscribers
  • 🎯 Expert-level tutorials on AI, Stable Diffusion, and generative models

📞 Connect & Learn


This tutorial provides comprehensive guidance for implementing Teacache acceleration in SwarmUI, enabling faster AI video and image generation with minimal quality loss.

Community

Sign up or log in to comment