Flux.1-dev TensorRT-RTX BF16 Ampere
TensorRT-RTX optimized engines for Flux.1-dev on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision.
Model Details
- Base Model: black-forest-labs/FLUX.1-dev
- Architecture: AMPERE (Compute Capability 8.6)
- Precision: BF16 (16-bit brain floating point)
- TensorRT-RTX Version: 1.0.0.21
- Image Resolution: 1024x1024
- Batch Size: 1 (static)
Engine Files
This repository contains 4 TensorRT engine files:
clip.plan
- CLIP text encodert5.plan
- T5 text encodertransformer.plan
- Flux transformer modelvae.plan
- VAE decoder
Total Size: 17.3GB
Hardware Requirements
- NVIDIA RTX 30 series (RTX 3080, 3090) or A100
- Compute Capability 8.6
- Minimum 24GB VRAM recommended
- TensorRT-RTX 1.0.0.21 runtime
Usage
# Example usage with TensorRT-RTX backend
from nvidia_demos.TensorRT_RTX.demo.flux1_dev.pipelines.flux_pipeline import FluxPipeline
pipeline = FluxPipeline(
cache_dir="./cache",
hf_token="your_hf_token"
)
# Load pre-built engines
pipeline.load_engines(
transformer_precision="bf16",
opt_batch_size=1,
opt_height=1024,
opt_width=1024
)
# Generate image
image = pipeline.infer(
prompt="A beautiful landscape with mountains",
height=1024,
width=1024
)
Performance
- Inference Speed: ~8-12 seconds per image (RTX 3090)
- Memory Usage: ~18-20GB VRAM
- Optimizations: Static shapes, BF16 precision, Ampere-specific kernels
License
This model follows the Flux.1-dev license terms. Please refer to the original model repository for licensing details.
Built With
- TensorRT-RTX 1.0.0.21
- NVIDIA Flux Demo
- Built on NVIDIA GeForce RTX 3090 (Ampere 8.6)
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for imgailab/flux1-dev-bf16-ampere
Base model
black-forest-labs/FLUX.1-dev