Flux.1-dev TensorRT-RTX BF16 Ampere

TensorRT-RTX optimized engines for Flux.1-dev on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision.

Model Details

Base Model: black-forest-labs/FLUX.1-dev
Architecture: AMPERE (Compute Capability 8.6)
Precision: BF16 (16-bit brain floating point)
TensorRT-RTX Version: 1.0.0.21
Image Resolution: 1024x1024
Batch Size: 1 (static)

Engine Files

This repository contains 4 TensorRT engine files:

clip.plan - CLIP text encoder
t5.plan - T5 text encoder
transformer.plan - Flux transformer model
vae.plan - VAE decoder

Total Size: 17.3GB

Hardware Requirements

NVIDIA RTX 30 series (RTX 3080, 3090) or A100
Compute Capability 8.6
Minimum 24GB VRAM recommended
TensorRT-RTX 1.0.0.21 runtime

Usage

# Example usage with TensorRT-RTX backend
from nvidia_demos.TensorRT_RTX.demo.flux1_dev.pipelines.flux_pipeline import FluxPipeline

pipeline = FluxPipeline(
    cache_dir="./cache",
    hf_token="your_hf_token"
)

# Load pre-built engines
pipeline.load_engines(
    transformer_precision="bf16",
    opt_batch_size=1,
    opt_height=1024,
    opt_width=1024
)

# Generate image
image = pipeline.infer(
    prompt="A beautiful landscape with mountains",
    height=1024,
    width=1024
)

Performance

Inference Speed: ~8-12 seconds per image (RTX 3090)
Memory Usage: ~18-20GB VRAM
Optimizations: Static shapes, BF16 precision, Ampere-specific kernels

License

This model follows the Flux.1-dev license terms. Please refer to the original model repository for licensing details.

Built With

TensorRT-RTX 1.0.0.21
NVIDIA Flux Demo
Built on NVIDIA GeForce RTX 3090 (Ampere 8.6)

imgailab
/

flux1-dev-bf16-ampere