SDXL TensorRT-RTX BF16 Ampere

TensorRT-RTX optimized engines for Stable Diffusion XL on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision.

Model Details

Base Model: stabilityai/stable-diffusion-xl-base-1.0
Architecture: AMPERE (Compute Capability 8.6)
Precision: BF16 (16-bit brain floating point)
TensorRT-RTX Version: 1.0.0.21
Image Resolution: 1024x1024
Batch Size: 1 (static)

Engine Files

This repository contains 4 TensorRT engine files:

clip.trt1.0.0.21.plan - CLIP text encoder
clip2.trt1.0.0.21.plan - CLIP text encoder 2
unetxl.trt1.0.0.21.plan - U-Net XL diffusion model
vae.trt1.0.0.21.plan - VAE decoder

Total Size: 6.5GB

Hardware Requirements

NVIDIA RTX 30 series (RTX 3060, 3070, 3080, 3090) or A100
Compute Capability 8.6
Minimum 12GB VRAM recommended
TensorRT-RTX 1.0.0.21 runtime

Usage

# Example usage with TensorRT-RTX backend
from imageai_server.shared.tensorrt_rtx_backend import TensorRTRTXBackend

backend = TensorRTRTXBackend()
backend.load_engines("path/to/engines")
image = backend.generate("A beautiful sunset over mountains")

Performance

Inference Speed: ~2-3 seconds per image (RTX 3090)
Memory Usage: ~6-8GB VRAM
Optimizations: Static shapes, BF16 precision, Ampere-specific kernels

License

This model is released under the same license as the base SDXL model (OpenRAIL++).

Built With

TensorRT-RTX 1.0.0.21
NVIDIA Diffusion Demo
Built on NVIDIA GeForce RTX 3090 (Ampere 8.6)

imgailab
/

sdxl-bf16-ampere