HOW TO USE THE flux-fp8 IN PYTHON?
Hello,
anyone can give me a sample code?
how to use it in Python?
my code like:
'''
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("Kijai/flux-fp8",
cache_dir='flux_fp8',
torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
prompt = "A cat holding a sign that says hello world"
out = pipe(
prompt=prompt,
guidance_scale=0.,
height=768,
width=1360,
num_inference_steps=4,
max_sequence_length=256,
).images[0]
out.save("image.png")
'''
but there is a 404 error.
same request
"""最近在研究一些生成式ai项目,处于某种不可描述的目的和研究兴趣,我并不是该行业的专业人士,但非常高兴来访相关的社区并且与相关人士进行交流,希望能学习或者交流一些有趣的技术"""
"""
My code may not be applicable to others, it is for reference only
My system environment: 64G RAM, 4070 12G VRAM, Windows 10 x64 22H2, 2.6.0+cu124, pycharm2024.3, python 3.12.7, cudann/xformers:installed
The last part of the article contains my generation effect description
"""
from PIL.Image import Image
from diffusers import DiffusionPipeline, FluxTransformer2DModel, AutoencoderTiny
from torchao.quantization import quantize_, int8_weight_only
import torch
from transformers import CLIPTokenizer
torch.cuda.empty_cache()
"""
Clean vram before running
"""
torch.backends.cuda.matmul.allow_tf32 = True
"""
It should have a speed-up effect, but I don't know much about it, and it's a small effect for me.
"""
from funcTools.ram_str import rams_only_num
"""
This is just a simple random number generator I defined to seed
"""
"""
Note:
I like offline deployment, and even though my configuration is not very high, the lightweight model runs pretty well.
If you are running the original diffuser model, examples are easily found on the huggingface or diffusers websites.
link:
https://huggingface.co/docs/diffusers/v0.32.2/index
https://huggingface.co/
"""
vae_path = "./cache__/hf_models/FLUX.1-dev/vae"
fp8_model_cover_tf_path = r"t2i-model/flux/flux-safetensors-custom/xxx"
"""
Using the official Black Forest as a basis, you can download the flux model from civitai, convert it to the original diffuser model, and load it into the flux pipeline as tranformers.
Total output value, you can use tranformers to generate pictures for the parameters. The style depends on the large model you downloaded from civitai. You can also load lora. For details, you can see the official tutorial of the diffuser
Conversion tools: https://github.com/xhinker/sd_embed/blob/main/src/sd_embed/conversion_tools.py
sd1.5, xl and other models can be directly loaded offline using a single file (but you need to prepare the configuration file. If you need it, please reply to me. These codes are 10 times simpler than flux!)
"""
"""
fp8_model =r"./t2i-model/flux/flux-safetensors-custom/xxx.safetensors"
transformers_config_path = './cache__/hf_models/FLUX.1-dev/transformer/config.json'
"""
bfl_repo = "./cache__/hf_models/FLUX.1-dev"
tkr_path = "./cache__/hf_models/FLUX.1-dev/tokenizer"
dt =torch.float16
"""
The strange thing is that when I use fp16 in the flux pipeline, it will cause a pure black image and an error.
I don’t think this is a security review (because I have generated nsfw images in bf16 or fp32). Using fp16 in sd1.5, xl, and pony is completely fine and fast.
fp16 only takes up 51% of my 64g ram ;), which is much better than bf16, but I am sad that flux cannot be used :(
Error message:
Python312\Lib\site-packages\diffusers\image_processor.py:147: RuntimeWarning: invalid value encountered in cast
images = (images * 255).round().astype("uint8")
"""
tokenizer = CLIPTokenizer.from_pretrained(
tkr_path,
torch_dtype=dt,
clean_up_tokenization_spaces=True,
local_files_only=True,
)
transformer = FluxTransformer2DModel.from_pretrained(
fp8_model_cover_tf_path,
subfolder="transformer",
torch_dtype=dt,
use_safetensors=True,
local_files_only=True,
)
quantize_(transformer, int8_weight_only())
vae = AutoencoderTiny.from_pretrained(
"./cache__/hf_models/taef1",
torch_dtype=dt,
use_safetensors=True,
local_files_only=True,
)
pipe = DiffusionPipeline.from_pretrained(
bfl_repo
, transformer = transformer
, torch_dtype = dt
, local_files_only = True
, use_safetensors = True
, vae = vae
# , scheduler = scheduler
, low_cpu_mem_usage = True
, tokenizer = tokenizer
# ,device_map="", #max_memory={0:"24GiB", "cpu":"4GiB"}
)
"""
scheduler = EulerDiscreteScheduler.from_pretrained('./cache__/hf_models/FLUX.1-dev/scheduler',)
or pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
"""
"""
Here are some optimization configurations. I just do what is suitable, but I can't do better.
"""
"""
pipe.enable_xformers_memory_efficient_attention()
xformer cannot be used in flux pipelines, but sd1.5 or sdxl, pony are suitable for other pipelines, and are very fast and have low usage.
Error message: UnboundLocalError: cannot access local variable 'attn_output' where it is not associated with a value
I don’t have a solution yet
"""
pipe.reset_device_map()
pipe.enable_vae_slicing()
"""
pipe.enable_attention_slicing()
pipe.to('cuda')
It is recommended not to enable before enable_sequential_cpu_offload
pipe.enable_model_cpu_offload()
Using it alone is similar to using CUDA, but it saves nearly twice as much memory and is slower than completely offloading to the GPU.
pipe.enable_attention_slicing("max")
pipe.to(dt)
pipe.to(xxxxx) can load some parameters externally instead of in the pipeline
"""
pipe.enable_sequential_cpu_offload(gpu_id=0,device='cuda')
pp = """
A highly detailed and imaginative image of a unique cocktail in a tall glass, designed to resemble an aquarium. The drink is clear with ice cubes, and inside the glass are elements that mimic an underwater scene: vibrant green aquatic plants, bright orange flowers, and small decorative objects resembling fish swimming through the drink. The glass is garnished with a thin slice of lemon on the rim and a straw placed elegantly into the drink. The background is a dimly lit, warm-toned bar, with the soft glow of ambient lights creating a cosy atmosphere. The overall composition is creative and whimsical, blending the art of mixology with a playful and visually captivating underwater theme.
"""
seeds = rams_only_num()
image = pipe(
prompt = pp
, guidance_scale = 3.5
, num_inference_steps = 15 # The general limit is 1-100, 10-20 is normal, and the official statement is 50 or above for high quality. If you run my code, after my test, it is best not to be lower than 12-15, otherwise the image quality may not be ideal
, height = 768
, width = 1232
, generator = torch.Generator('cuda:0').manual_seed(seeds)
, max_sequence_length = 512
# , guidance_rescale = 0.7
).images[0]
"""
You don't need to add .Generator('cuda:0')
"""
name_out_path = f'flux-ala-{seeds}.png'
image.save(name_out_path)
print(' [+] ',name_out_path,' --- Well Done...\n')
Image.show(image)
print(' [!] ',name_out_path,' --- showing img...\n')
"""
All the above content is for reference only. Under this condition, it takes about 1 minute for me to generate an image. The memory usage is up to 59GB, the video memory usage is only 1g, but the GPU usage is 30%.
I am considering how to better allocate memory and video memory. I am also trying to find information. If someone can communicate with me, I would be very grateful.
The specific parameters still need to be tried to get better image quality.
If you have more questions, maybe checking the official documentation or asking the llm big model is a good choice.
"""