Spaces:
Running
on
Zero
๐ Bug Report: Incorrect VAE Used in Pipeline
๐ Bug Report: Incorrect VAE Used in Pipeline
Title: Fix Default VAE: Use AutoencoderKL
Instead of AutoencoderTiny
๐ Description
The app currently loads a high-quality AutoencoderKL
from the FLUX.1-Krea-dev
model repository, but uses taef1
(AutoencoderTiny
) as the VAE in the pipeline by default.
This causes:
- ๐ Noticeable drop in image quality
- โ ๏ธ Runtime warning during startup:
Expected types for vae: (<class 'AutoencoderKL'>,), got <class 'AutoencoderTiny'>.
- ๐ฅ Wasted memory and GPU resources (loads two VAEs)
- โ Misalignment with the expected high-fidelity output of FLUX.1 models
๐ Root Cause
In app.py
, the pipeline is initialized using taef1
(a lightweight, low-fidelity decoder), even though the full AutoencoderKL
is already loaded:
pipe = DiffusionPipeline.from_pretrained("PierrunoYT/FLUX.1-Krea-dev", ..., vae=taef1)
Despite its speed, AutoencoderTiny
sacrifices detail, color accuracy, and reconstruction quality โ making it unsuitable as the default VAE for a high-end model like FLUX.1-Krea-dev
.
Meanwhile, good_vae
(an AutoencoderKL
) is correctly loaded from the model's vae
subfolder but is ignored in favor of the lower-quality alternative.
โ Expected Behavior
By default, the pipeline should use the high-quality AutoencoderKL
included in the FLUX.1-Krea-dev
model for accurate and detailed image generation.
๐ก Solution
Replace taef1
with good_vae
when initializing the pipeline:
pipe = DiffusionPipeline.from_pretrained(
"PierrunoYT/FLUX.1-Krea-dev",
torch_dtype=dtype,
vae=good_vae # โ AutoencoderKL (high quality), not taef1
).to(device)
๐ ๏ธ Suggested Fix
Update the pipeline creation line:
- pipe = DiffusionPipeline.from_pretrained("PierrunoYT/FLUX.1-Krea-dev", torch_dtype=dtype, vae=taef1).to(device)
+ pipe = DiffusionPipeline.from_pretrained("PierrunoYT/FLUX.1-Krea-dev", torch_dtype=dtype, vae=good_vae).to(device)
Additionally:
- Remove the unused
taef1
loading line:taef1 = AutoencoderTiny.from_pretrained("PierrunoYT/taef1", ...)
- Remove the
AutoencoderTiny
import if no longer needed.
โ This cleanup improves performance, reduces VRAM usage, and eliminates misleading warnings.
๐ Impact
- โ Sharper, more detailed image outputs
- โ No more VAE type mismatch warnings
- โ Better use of the model's full capabilities
- โ Reduced memory footprint
Thank you for your excellent work bringing FLUX.1-Krea-dev to Pinokio!
This small change ensures users experience the true quality that FLUX.1 was designed for.
โ One-sentence summary for maintainers:
The pipeline currently usesAutoencoderTiny
(taef1
) by default instead of the higher-qualityAutoencoderKL
included in the model, unnecessarily degrading output fidelity.
Are you serious?? The purpose of taef1
is to act as a smaller and faster VAE for decoding the intermediate latent outputs for each denoising step.
When all the denoising steps are done, the good_vae
is used to decode the final latent and produce the final output image, which should be high quality as expected. There is not much need to use the good_vae
on the intermediate latents, because they are simply just previews of the final image, and it will significantly slow down the diffusion process (and people generally use their limited free ZeroGPU time allocation in this demo, so this will also unnecessarily waste their usage quota)
Have you even taken a proper look at the code, or tried to run the code locally?? And I am pretty sure you simply used an LLM to write this, without doing anything else on your end. How do you even know there would be "VAE type mismatch warnings"??
The model card specifically states that it is meant to be compatible with the Flux architecture, so there should not be any type mismatch at all. When I tried running the code locally (with adjustments, such as using the DF11 version of the transformer weights, and a crude attempt to manually swap the text encoder and transformer components in and out of VRAM, since enable_model_cpu_offload()
seems unable to work with a custom pipeline call) to fit into my 24GB of VRAM, it works as expected, and the only extra "warning" message is as follows:
The config attributes {'block_out_channels': [64, 64, 64, 64]} were passed to AutoencoderTiny, but are not expected and will be ignored. Please verify your config.json configuration file.
And that is it. The live preview of intermediate denoising steps indeed works as expected, and the final output is indeed "high-fidelity".
Mhh okay thx for your answer. If I was wrong I'm sorry. I got the mistmatch from the AutoEncoder and the AI told me this.