|
--- |
|
license: apache-2.0 |
|
pipeline_tag: text-to-image |
|
--- |
|
# Chroma1-HD |
|
|
|
Chroma1-HD is an **8.9B** parameter text-to-image foundational model based on **FLUX.1-schnell**. It is fully **Apache 2.0 licensed**, ensuring that anyone can use, modify, and build upon it. |
|
|
|
As a **base model**, Chroma1 is intentionally designed to be an excellent starting point for **finetuning**. It provides a strong, neutral foundation for developers, researchers, and artists to create specialized models. |
|
|
|
for the fast CFG "baked" version please go to [Chroma1-Flash](https://huggingface.co/lodestones/Chroma1-Flash). |
|
|
|
### Key Features |
|
* **High-Performance Base:** 8.9B parameters, built on the powerful FLUX.1 architecture. |
|
* **Easily Finetunable:** Designed as an ideal checkpoint for creating custom, specialized models. |
|
* **Community-Driven & Open-Source:** Fully transparent with an Apache 2.0 license, and training history. |
|
* **Flexible by Design:** Provides a flexible foundation for a wide range of generative tasks. |
|
|
|
## Special Thanks |
|
A massive thank you to our supporters who make this project possible. |
|
* **Anonymous donor** whose incredible generosity funded the pretraining run and data collections. Your support has been transformative for open-source AI. |
|
* **Fictional.ai** for their fantastic support and for helping push the boundaries of open-source AI. You can try Chroma on their platform: |
|
|
|
[](https://fictional.ai/?ref=chroma_hf) |
|
|
|
## How to Use |
|
|
|
### `diffusers` Library |
|
|
|
```python |
|
import torch |
|
from diffusers import ChromaPipeline |
|
|
|
pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-HD", torch_dtype=torch.bfloat16) |
|
pipe.enable_model_cpu_offload() |
|
|
|
prompt = [ |
|
"A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done." |
|
] |
|
negative_prompt = ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"] |
|
|
|
image = pipe( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
generator=torch.Generator("cpu").manual_seed(433), |
|
num_inference_steps=40, |
|
guidance_scale=3.0, |
|
num_images_per_prompt=1, |
|
).images[0] |
|
image.save("chroma.png") |
|
``` |
|
ComfyUI |
|
For advanced users and customized workflows, you can use Chroma with ComfyUI. |
|
|
|
**Requirements:** |
|
* A working ComfyUI installation. |
|
* [Chroma checkpoint](https://huggingface.co/lodestones/Chroma) (latest version). |
|
* [T5 XXL Text Encoder](https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors). |
|
* [FLUX VAE](https://huggingface.co/lodestones/Chroma/resolve/main/ae.safetensors). |
|
* [Chroma Workflow JSON](https://huggingface.co/lodestones/Chroma/resolve/main/ChromaSimpleWorkflow20250507.json). |
|
|
|
**Setup:** |
|
1. Place the `T5_xxl` model in your `ComfyUI/models/clip` folder. |
|
2. Place the `FLUX VAE` in your `ComfyUI/models/vae` folder. |
|
3. Place the `Chroma checkpoint` in your `ComfyUI/models/diffusion_models` folder. |
|
4. Load the Chroma workflow file into ComfyUI and run. |
|
|
|
## Model Details |
|
* **Architecture:** Based on the 8.9B parameter FLUX.1-schnell model. |
|
* **Training Data:** Trained on a 5M sample dataset curated from a 20M pool, including artistic, photographic, and niche styles. |
|
* **Technical Report:** A comprehensive technical paper detailing the architectural modifications and training process is forthcoming. |
|
|
|
## Intended Use |
|
Chroma is intended to be used as a **base model** for researchers and developers to build upon. It is ideal for: |
|
* Finetuning on specific styles, concepts, or characters. |
|
* Research into generative model behavior, alignment, and safety. |
|
* As a foundational component in larger AI systems. |
|
|
|
## Limitations and Bias Statement |
|
Chroma is trained on a broad, filtered dataset from the internet. As such, it may reflect the biases and stereotypes present in its training data. The model is released in a state as is and has not been aligned with a specific safety filter. |
|
|
|
Users are responsible for their own use of this model. It has the potential to generate content that may be considered harmful, explicit, or offensive. I encourage developers to implement appropriate safeguards and ethical considerations in their downstream applications. |
|
|
|
## Summary of Architectural Modifications |
|
*(For a full breakdown, tech report soon-ish.)* |
|
|
|
* **12B → 8.9B Parameters:** |
|
* **TL;DR:** I replaced a 3.3B parameter timestep-encoding layer with a more efficient 250M parameter FFN, as the original was vastly oversized for its task. |
|
* **MMDiT Masking:** |
|
* **TL;DR:** Masking T5 padding tokens enhanced fidelity and increased training stability by preventing the model from focusing on irrelevant `<pad>` tokens. |
|
* **Custom Timestep Distributions:** |
|
* **TL;DR:** I implemented a custom timestep sampling distribution (`-x^2`) to prevent loss spikes and ensure the model trains effectively on both high-noise and low-noise regions. |
|
|
|
## P.S |
|
Chroma1-HD is Chroma-v.50 |
|
|
|
## Citation |
|
``` |
|
@misc{rock2025chroma, |
|
author = {Lodestone Rock}, |
|
title = {Chroma1-HD}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face repository}, |
|
howpublished = {\url{https://huggingface.co/lodestones/Chroma1-HD}}, |
|
} |
|
``` |