AbstractPhil's picture
Update README.md
f5ad5bf verified
|
raw
history blame
1.49 kB
metadata
license: apache-2.0
base_model:
  - google/flan-t5-base
  - laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
datasets:
  - AbstractPhil/human-templated-captions-1b

Simple Summary

This project provides an advanced text control system for any AI generator that uses CLIP-ViT-bigG-14-laion2B-39B-b160k as a basis. Also known as CLIP_G.

It lets you “steer” how AI interprets your written prompts by adding a smart adapter between the text input and the image model. By fine-tuning how the prompt is understood, you get more accurate, creative, or controllable AI-generated images—especially in complex or multi-style models like Stable Diffusion XL.

More technical summary

This repository contains code, configuration, and weights for the Dual Shunt Adapter: a modular cross-attention prompt embedding controller designed for SDXL and multi-CLIP diffusion systems. The adapter bridges T5 (or other transformer) text encoders with CLIP-based pooled embedding spaces, providing delta, gate, log_sigma, anchor, and guidance outputs for per-token, per-field semantic modulation. Compatible with custom and parallel CLIP streams (e.g., SDXL’s CLIP-L/CLIP-G), the system enables targeted latent field steering, dynamic classifier-free guidance, and localized prompt injection for advanced generative workflows—including direct integration with ComfyUI and HuggingFace Diffusers.

Code

The model code is present in model.py. Inference code will be available in the long-winded article.