T5-Flan-Base encoder/decoder variation.

summarize:

caption:

or no caption (anchored only) versions - which do not require the anchor toggle to be enabled.

Simple Summary

This project provides an advanced text control system for any AI generator that uses CLIP-ViT-bigG-14-laion2B-39B-b160k as a basis. Also known as CLIP_G.

It lets you “steer” how AI interprets your written prompts by adding a smart adapter between the text input and the image model. By fine-tuning how the prompt is understood, you get more accurate, creative, or controllable AI-generated images—especially in complex or multi-style models like Stable Diffusion XL.

More technical summary

This repository contains code, configuration, and weights for the Dual Shunt Adapter: a modular cross-attention prompt embedding controller designed for SDXL and multi-CLIP diffusion systems. The adapter bridges T5 (or other transformer) text encoders with CLIP-based pooled embedding spaces, providing delta, gate, log_sigma, anchor, and guidance outputs for per-token, per-field semantic modulation. Compatible with custom and parallel CLIP streams (e.g., SDXL’s CLIP-L/CLIP-G), the system enables targeted latent field steering, dynamic classifier-free guidance, and localized prompt injection for advanced generative workflows—including direct integration with ComfyUI and HuggingFace Diffusers.

The "no captions" versions have learned to conform entirely on a zero prompt state, where it is given no prompt as a baseline and then forced to learn null space via Flan-T5-Base encodings compared to the anchored prompt. This has been the most robust implementation of them all so far as the outcomes show the best results visually and they are the most difficult to corrupt or damage through additional training.

Adding the "noise" variation to the "no captions" normalized weights, results in a higher response and potency to very short or brief prompts based on random tokens being planted throughout the caption at various locations. This bleeds additional information into the model slowly while still allowing it to converge more rapidly without conforming to the non-noise hard-commit encoding memorization alternative.

Code

The model code is present in model.py. Inference code will be available in the long-winded article.

The SDXL shunt inference code is present in the comfyui release which is hosted on the github.

The training notebook is available now as I just pushed it.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbstractPhil/t5-flan-base-vit-bigG-14-dual-stream-adapter

Finetuned
(776)
this model

Dataset used to train AbstractPhil/t5-flan-base-vit-bigG-14-dual-stream-adapter

Space using AbstractPhil/t5-flan-base-vit-bigG-14-dual-stream-adapter 1