--- library_name: transformers tags: [] --- # Scaling Down Text Encoders of Text-to-Image Diffusion Models Official Repository of the paper: *[Scaling Down Text Encoders of Text-to-Image Diffusion Models](https://github.com/LifuWang-66/DistillT5)*. Project Page: https://github.com/LifuWang-66/DistillT5.git ## Model Descriptions: T5-Base distilled from [T5-XXL](https://huggingface.co/google/flan-t5-xxl) using [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev). It is 50 times smaller and retains most capability of T5-XXL. ## Generation Results:

## Usage: 1. Setup the environment: ``` git clone https://github.com/LifuWang-66/DistillT5.git cd DistillT5 conda create -n distillt5 python=3.12 conda activate distillt5 pip install -r requirements.txt pip install ./diffusers ``` 2. Inference ```py import sys import os sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) from models.T5_encoder import T5EncoderWithProjection import torch from diffusers import FluxPipeline pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.float16) text_encoder = T5EncoderWithProjection.from_pretrained('LifuWang/DistillT5', torch_dtype=torch.float16) pipe.text_encoder_2 = text_encoder pipe = pipe.to('cuda') prompt = "Photorealistic portrait of a stylish young woman wearing a futuristic golden sequined bodysuit that catches the light, creating a metallic, mirror-like effect. She is wearing large, reflective blue-tinted aviator sunglasses. Over her head, she wears headphones with metallic accents, giving a modern, cyber aesthetic." image = pipe(prompt=prompt, num_images_per_prompt=1, guidance_scale=3.5, num_inference_steps=20).images[0] image.save("t5_base.png") ```