Cédric
AI & ML interests
Recent Activity
Organizations
PRX Part 3 — Training a Text-to-Image Model in 24h!
Text-to-image Architectural Experiments
Your app is paused
Training Design for Text-to-Image Models: Lessons from Ablations
Finegrain Light Switcher (Lite Version)
Instantly turn lamps on in your images
Finegrain Object Eraser (Lite Version)
Erase any object from an image with just a prompt
Finegrain Object Cutter
Create HD cutouts from any image with just a prompt
zeroing and reshaping the text-related cross-attentions into self-attentions
It's actually narrowing, not zeroing (even though strategy="zeros" is used in the StateDictAdapter()).
For instance, the logs show:
Adapting down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight by narrowing from shape torch.Size([320, 768]) to torch.Size([320, 320])
So the extra weights are just discarded in this case. Zero-filling is only used when expanding tensors to larger shapes.
Corresponding code: link.
This is made with a one-step SD1.5 LBM [1] eraser !
Data is open. Data pipeline is open. Training code is open.
On our LBM fork : https://github.com/finegrain-ai/LBM
[1] LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535)
Today we have trained a LBM [2] promptless inpainter using
Re-LAION-Caption19M[3].We use a subset of 1.25M images with
aesthetic_score > 5.6 and pwatermark < 0.2 and LaMa [2] mask generation.2 takeaways :
🖼 Inpainting is better compared to our RORD experiments [5]
🦶 "4 steps" outperforms single-step
[1] Finegrain LBM Fork : https://github.com/finegrain-ai/LBM
[2] LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535)
[3] supermodelresearch/Re-LAION-Caption19M
[4] Resolution-robust Large Mask Inpainting with Fourier Convolutions (2109.07161)
[5] https://huggingface.co/posts/piercus/778833977889788
cc @supermodelresearch @presencesw
When repurposing a T2I model into a pure I2I model, there’s always that orphaned text path — what do we do with it? 🤔
You can reuse it as learnable embeddings in multi-task setups [2], freeze an empty text prompt, distillate or prune the corresponding part.
In LBM, they take a clever route — zeroing [3] and reshaping [4] the text-related cross-attentions into self-attentions.
This gives you fresh weights for I2I computation, nicely integrated into your SD architecture.
📎 References
[1] Our LBM Fork: https://github.com/finegrain-ai/LBM
[2] OmniPaint: OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting (2503.08677)
[3] LBM Zeroing: https://github.com/gojasper/LBM/blob/cafebc46a9ac16dcc61691d289cc4676b5c75380/examples/training/train_lbm_surface.py#L147-L148
[4] LBM Reshaping: https://github.com/gojasper/LBM/blob/cafebc46a9ac16dcc61691d289cc4676b5c75380/examples/training/train_lbm_surface.py#L100
SOTA OCR with Core ML and dots.ocr
🚀 1-step only inference, no distillation
🪶 Light backbone :SD1.5
🧠 Light training : converge in 6k steps
Now let's improve this, especially the inpainting capabilities. Stay tuned for more :-)
LBM paper : LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535)
Our LBM fork : https://github.com/finegrain-ai/LBM
Swift 🧨Diffusers - Fast Stable Diffusion for Mac
Our fork : https://github.com/finegrain-ai/LBM
LBM paper: LBM: Latent Bridge Matching for Fast Image-to-Image Translation (2503.07535)
LBM relighting demo : jasperai/LBM_relighting
Finegrain Product Placement LoRA
Flux Kontext extended with product placement capabilities
Finegrain Light Switcher (Lite Version)
Instantly turn lamps on in your images
Finegrain Image Enhancer
Clarity AI Upscaler Reproduction