Chroma1-HD
Chroma1-HD is an 8.9B parameter text-to-image foundational model based on FLUX.1-schnell. It is fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build upon it.
As a base model, Chroma1 is intentionally designed to be an excellent starting point for finetuning. It provides a strong, neutral foundation for developers, researchers, and artists to create specialized models.
for the fast CFG "baked" version please go to Chroma1-Flash.
Key Features
- High-Performance Base: 8.9B parameters, built on the powerful FLUX.1 architecture.
- Easily Finetunable: Designed as an ideal checkpoint for creating custom, specialized models.
- Community-Driven & Open-Source: Fully transparent with an Apache 2.0 license, and training history.
- Flexible by Design: Provides a flexible foundation for a wide range of generative tasks.
Special Thanks
A massive thank you to our supporters who make this project possible.
- Anonymous donor whose incredible generosity funded the pretraining run and data collections. Your support has been transformative for open-source AI.
- Fictional.ai for their fantastic support and for helping push the boundaries of open-source AI. You can try Chroma on their platform:
How to Use
diffusers
Library
import torch
from diffusers import ChromaPipeline
pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-HD", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
prompt = [
"A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt = ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
generator=torch.Generator("cpu").manual_seed(433),
num_inference_steps=40,
guidance_scale=3.0,
num_images_per_prompt=1,
).images[0]
image.save("chroma.png")
ComfyUI For advanced users and customized workflows, you can use Chroma with ComfyUI.
Requirements:
- A working ComfyUI installation.
- Chroma checkpoint (latest version).
- T5 XXL Text Encoder.
- FLUX VAE.
- Chroma Workflow JSON.
Setup:
- Place the
T5_xxl
model in yourComfyUI/models/clip
folder. - Place the
FLUX VAE
in yourComfyUI/models/vae
folder. - Place the
Chroma checkpoint
in yourComfyUI/models/diffusion_models
folder. - Load the Chroma workflow file into ComfyUI and run.
Model Details
- Architecture: Based on the 8.9B parameter FLUX.1-schnell model.
- Training Data: Trained on a 5M sample dataset curated from a 20M pool, including artistic, photographic, and niche styles.
- Technical Report: A comprehensive technical paper detailing the architectural modifications and training process is forthcoming.
Intended Use
Chroma is intended to be used as a base model for researchers and developers to build upon. It is ideal for:
- Finetuning on specific styles, concepts, or characters.
- Research into generative model behavior, alignment, and safety.
- As a foundational component in larger AI systems.
Limitations and Bias Statement
Chroma is trained on a broad, filtered dataset from the internet. As such, it may reflect the biases and stereotypes present in its training data. The model is released in a state as is and has not been aligned with a specific safety filter.
Users are responsible for their own use of this model. It has the potential to generate content that may be considered harmful, explicit, or offensive. I encourage developers to implement appropriate safeguards and ethical considerations in their downstream applications.
Summary of Architectural Modifications
(For a full breakdown, tech report soon-ish.)
- 12B โ 8.9B Parameters:
- TL;DR: I replaced a 3.3B parameter timestep-encoding layer with a more efficient 250M parameter FFN, as the original was vastly oversized for its task.
- MMDiT Masking:
- TL;DR: Masking T5 padding tokens enhanced fidelity and increased training stability by preventing the model from focusing on irrelevant
<pad>
tokens.
- TL;DR: Masking T5 padding tokens enhanced fidelity and increased training stability by preventing the model from focusing on irrelevant
- Custom Timestep Distributions:
- TL;DR: I implemented a custom timestep sampling distribution (
-x^2
) to prevent loss spikes and ensure the model trains effectively on both high-noise and low-noise regions.
- TL;DR: I implemented a custom timestep sampling distribution (
P.S
Chroma1-HD is Chroma-v.50
Citation
@misc{rock2025chroma,
author = {Lodestone Rock},
title = {Chroma1-HD},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/lodestones/Chroma1-HD}},
}
- Downloads last month
- 4,483