SDXL TensorRT-RTX BF16 Ampere

TensorRT-RTX optimized engines for Stable Diffusion XL on NVIDIA Ampere architecture (RTX 30 series, A100, etc.) with BF16 precision.

Model Details

  • Base Model: stabilityai/stable-diffusion-xl-base-1.0
  • Architecture: AMPERE (Compute Capability 8.6)
  • Precision: BF16 (16-bit brain floating point)
  • TensorRT-RTX Version: 1.0.0.21
  • Image Resolution: 1024x1024
  • Batch Size: 1 (static)

Engine Files

This repository contains 4 TensorRT engine files:

  • clip.trt1.0.0.21.plan - CLIP text encoder

  • clip2.trt1.0.0.21.plan - CLIP text encoder 2

  • unetxl.trt1.0.0.21.plan - U-Net XL diffusion model

  • vae.trt1.0.0.21.plan - VAE decoder

Total Size: 6.5GB

Hardware Requirements

  • NVIDIA RTX 30 series (RTX 3060, 3070, 3080, 3090) or A100
  • Compute Capability 8.6
  • Minimum 12GB VRAM recommended
  • TensorRT-RTX 1.0.0.21 runtime

Usage

# Example usage with TensorRT-RTX backend
from imageai_server.shared.tensorrt_rtx_backend import TensorRTRTXBackend

backend = TensorRTRTXBackend()
backend.load_engines("path/to/engines")
image = backend.generate("A beautiful sunset over mountains")

Performance

  • Inference Speed: ~2-3 seconds per image (RTX 3090)
  • Memory Usage: ~6-8GB VRAM
  • Optimizations: Static shapes, BF16 precision, Ampere-specific kernels

License

This model is released under the same license as the base SDXL model (OpenRAIL++).

Built With

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for imgailab/sdxl-bf16-ampere

Finetuned
(1206)
this model