๐Ÿ› Bug Report: Incorrect VAE Used in Pipeline

#3
by PierrunoYT - opened

๐Ÿ› Bug Report: Incorrect VAE Used in Pipeline

Title: Fix Default VAE: Use AutoencoderKL Instead of AutoencoderTiny


๐Ÿ“ Description

The app currently loads a high-quality AutoencoderKL from the FLUX.1-Krea-dev model repository, but uses taef1 (AutoencoderTiny) as the VAE in the pipeline by default.

This causes:

  • ๐Ÿ“‰ Noticeable drop in image quality
  • โš ๏ธ Runtime warning during startup:
    Expected types for vae: (<class 'AutoencoderKL'>,), got <class 'AutoencoderTiny'>.
    
  • ๐Ÿ’ฅ Wasted memory and GPU resources (loads two VAEs)
  • โŒ Misalignment with the expected high-fidelity output of FLUX.1 models

๐Ÿ” Root Cause

In app.py, the pipeline is initialized using taef1 (a lightweight, low-fidelity decoder), even though the full AutoencoderKL is already loaded:

pipe = DiffusionPipeline.from_pretrained("PierrunoYT/FLUX.1-Krea-dev", ..., vae=taef1)

Despite its speed, AutoencoderTiny sacrifices detail, color accuracy, and reconstruction quality โ€” making it unsuitable as the default VAE for a high-end model like FLUX.1-Krea-dev.

Meanwhile, good_vae (an AutoencoderKL) is correctly loaded from the model's vae subfolder but is ignored in favor of the lower-quality alternative.

โœ… Expected Behavior

By default, the pipeline should use the high-quality AutoencoderKL included in the FLUX.1-Krea-dev model for accurate and detailed image generation.

๐Ÿ’ก Solution

Replace taef1 with good_vae when initializing the pipeline:

pipe = DiffusionPipeline.from_pretrained(
    "PierrunoYT/FLUX.1-Krea-dev",
    torch_dtype=dtype,
    vae=good_vae  # โ† AutoencoderKL (high quality), not taef1
).to(device)

๐Ÿ› ๏ธ Suggested Fix

Update the pipeline creation line:

- pipe = DiffusionPipeline.from_pretrained("PierrunoYT/FLUX.1-Krea-dev", torch_dtype=dtype, vae=taef1).to(device)
+ pipe = DiffusionPipeline.from_pretrained("PierrunoYT/FLUX.1-Krea-dev", torch_dtype=dtype, vae=good_vae).to(device)

Additionally:

  • Remove the unused taef1 loading line:
    taef1 = AutoencoderTiny.from_pretrained("PierrunoYT/taef1", ...)
    
  • Remove the AutoencoderTiny import if no longer needed.

โœ… This cleanup improves performance, reduces VRAM usage, and eliminates misleading warnings.

๐Ÿ“Œ Impact

  • โœ… Sharper, more detailed image outputs
  • โœ… No more VAE type mismatch warnings
  • โœ… Better use of the model's full capabilities
  • โœ… Reduced memory footprint

Thank you for your excellent work bringing FLUX.1-Krea-dev to Pinokio!
This small change ensures users experience the true quality that FLUX.1 was designed for.


โœ… One-sentence summary for maintainers:
The pipeline currently uses AutoencoderTiny (taef1) by default instead of the higher-quality AutoencoderKL included in the model, unnecessarily degrading output fidelity.

PierrunoYT changed discussion title from ### ๐Ÿ› Bug Report: Incorrect VAE Used in Pipeline to ๐Ÿ› Bug Report: Incorrect VAE Used in Pipeline

Are you serious?? The purpose of taef1 is to act as a smaller and faster VAE for decoding the intermediate latent outputs for each denoising step.

image.png

When all the denoising steps are done, the good_vae is used to decode the final latent and produce the final output image, which should be high quality as expected. There is not much need to use the good_vae on the intermediate latents, because they are simply just previews of the final image, and it will significantly slow down the diffusion process (and people generally use their limited free ZeroGPU time allocation in this demo, so this will also unnecessarily waste their usage quota)

Have you even taken a proper look at the code, or tried to run the code locally?? And I am pretty sure you simply used an LLM to write this, without doing anything else on your end. How do you even know there would be "VAE type mismatch warnings"??

image.png

The model card specifically states that it is meant to be compatible with the Flux architecture, so there should not be any type mismatch at all. When I tried running the code locally (with adjustments, such as using the DF11 version of the transformer weights, and a crude attempt to manually swap the text encoder and transformer components in and out of VRAM, since enable_model_cpu_offload() seems unable to work with a custom pipeline call) to fit into my 24GB of VRAM, it works as expected, and the only extra "warning" message is as follows:

The config attributes {'block_out_channels': [64, 64, 64, 64]} were passed to AutoencoderTiny, but are not expected and will be ignored. Please verify your config.json configuration file.

And that is it. The live preview of intermediate denoising steps indeed works as expected, and the final output is indeed "high-fidelity".

Mhh okay thx for your answer. If I was wrong I'm sorry. I got the mistmatch from the AutoEncoder and the AI told me this.

Sign up or log in to comment