license: apache-2.0
base_model:
- google/flan-t5-base
- laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
datasets:
- AbstractPhil/human-templated-captions-1b
Simple Summary
This project provides an advanced text control system for any AI generator that uses CLIP-ViT-bigG-14-laion2B-39B-b160k as a basis. Also known as CLIP_G.
It lets you “steer” how AI interprets your written prompts by adding a smart adapter between the text input and the image model. By fine-tuning how the prompt is understood, you get more accurate, creative, or controllable AI-generated images—especially in complex or multi-style models like Stable Diffusion XL.
More technical summary
This repository contains code, configuration, and weights for the Dual Shunt Adapter: a modular cross-attention prompt embedding controller designed for SDXL and multi-CLIP diffusion systems. The adapter bridges T5 (or other transformer) text encoders with CLIP-based pooled embedding spaces, providing delta, gate, log_sigma, anchor, and guidance outputs for per-token, per-field semantic modulation. Compatible with custom and parallel CLIP streams (e.g., SDXL’s CLIP-L/CLIP-G), the system enables targeted latent field steering, dynamic classifier-free guidance, and localized prompt injection for advanced generative workflows—including direct integration with ComfyUI and HuggingFace Diffusers.
Code
The model code is present in model.py. Inference code will be available in the long-winded article.