Flux.1-dev TensorRT-RTX BF16 Ampere

TensorRT-RTX optimized engines for Flux.1-dev on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision.

Model Details

  • Base Model: black-forest-labs/FLUX.1-dev
  • Architecture: AMPERE (Compute Capability 8.6)
  • Precision: BF16 (16-bit brain floating point)
  • TensorRT-RTX Version: 1.0.0.21
  • Image Resolution: 1024x1024
  • Batch Size: 1 (static)

Engine Files

This repository contains 4 TensorRT engine files:

  • clip.plan - CLIP text encoder

  • t5.plan - T5 text encoder

  • transformer.plan - Flux transformer model

  • vae.plan - VAE decoder

Total Size: 17.3GB

Hardware Requirements

  • NVIDIA RTX 30 series (RTX 3080, 3090) or A100
  • Compute Capability 8.6
  • Minimum 24GB VRAM recommended
  • TensorRT-RTX 1.0.0.21 runtime

Usage

# Example usage with TensorRT-RTX backend
from nvidia_demos.TensorRT_RTX.demo.flux1_dev.pipelines.flux_pipeline import FluxPipeline

pipeline = FluxPipeline(
    cache_dir="./cache",
    hf_token="your_hf_token"
)

# Load pre-built engines
pipeline.load_engines(
    transformer_precision="bf16",
    opt_batch_size=1,
    opt_height=1024,
    opt_width=1024
)

# Generate image
image = pipeline.infer(
    prompt="A beautiful landscape with mountains",
    height=1024,
    width=1024
)

Performance

  • Inference Speed: ~8-12 seconds per image (RTX 3090)
  • Memory Usage: ~18-20GB VRAM
  • Optimizations: Static shapes, BF16 precision, Ampere-specific kernels

License

This model follows the Flux.1-dev license terms. Please refer to the original model repository for licensing details.

Built With

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for imgailab/flux1-dev-bf16-ampere

Finetuned
(477)
this model