Chroma1-HD / README.md

Update README.md

af68924 verified 2 days ago

5.5 kB

	---
	license: apache-2.0
	pipeline_tag: text-to-image
	---
	# Chroma1-HD

	Chroma1-HD is an 8.9B parameter text-to-image foundational model based on FLUX.1-schnell. It is fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build upon it.

	As a base model, Chroma1 is intentionally designed to be an excellent starting point for finetuning. It provides a strong, neutral foundation for developers, researchers, and artists to create specialized models.

	for the fast CFG "baked" version please go to [Chroma1-Flash](https://huggingface.co/lodestones/Chroma1-Flash).

	### Key Features
	* High-Performance Base: 8.9B parameters, built on the powerful FLUX.1 architecture.
	* Easily Finetunable: Designed as an ideal checkpoint for creating custom, specialized models.
	* Community-Driven & Open-Source: Fully transparent with an Apache 2.0 license, and training history.
	* Flexible by Design: Provides a flexible foundation for a wide range of generative tasks.

	## Special Thanks
	A massive thank you to our supporters who make this project possible.
	* Anonymous donor whose incredible generosity funded the pretraining run and data collections. Your support has been transformative for open-source AI.
	* Fictional.ai for their fantastic support and for helping push the boundaries of open-source AI. You can try Chroma on their platform:

	[![FictionalChromaBanner_1.png](./images/FictionalChromaBanner_1.png)](https://fictional.ai/?ref=chroma_hf)

	## How to Use

	### `diffusers` Library

	```python
	import torch
	from diffusers import ChromaPipeline

	pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-HD", torch_dtype=torch.bfloat16)
	pipe.enable_model_cpu_offload()

	prompt = [
	"A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
	]
	negative_prompt = ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

	image = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
	generator=torch.Generator("cpu").manual_seed(433),
	num_inference_steps=40,
	guidance_scale=3.0,
	num_images_per_prompt=1,
	).images[0]
	image.save("chroma.png")
	```
	ComfyUI
	For advanced users and customized workflows, you can use Chroma with ComfyUI.

	Requirements:
	* A working ComfyUI installation.
	* [Chroma checkpoint](https://huggingface.co/lodestones/Chroma) (latest version).
	* [T5 XXL Text Encoder](https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors).
	* [FLUX VAE](https://huggingface.co/lodestones/Chroma/resolve/main/ae.safetensors).
	* [Chroma Workflow JSON](https://huggingface.co/lodestones/Chroma/resolve/main/ChromaSimpleWorkflow20250507.json).

	Setup:
	1. Place the `T5_xxl` model in your `ComfyUI/models/clip` folder.
	2. Place the `FLUX VAE` in your `ComfyUI/models/vae` folder.
	3. Place the `Chroma checkpoint` in your `ComfyUI/models/diffusion_models` folder.
	4. Load the Chroma workflow file into ComfyUI and run.

	## Model Details
	* Architecture: Based on the 8.9B parameter FLUX.1-schnell model.
	* Training Data: Trained on a 5M sample dataset curated from a 20M pool, including artistic, photographic, and niche styles.
	* Technical Report: A comprehensive technical paper detailing the architectural modifications and training process is forthcoming.

	## Intended Use
	Chroma is intended to be used as a base model for researchers and developers to build upon. It is ideal for:
	* Finetuning on specific styles, concepts, or characters.
	* Research into generative model behavior, alignment, and safety.
	* As a foundational component in larger AI systems.

	## Limitations and Bias Statement
	Chroma is trained on a broad, filtered dataset from the internet. As such, it may reflect the biases and stereotypes present in its training data. The model is released in a state as is and has not been aligned with a specific safety filter.

	Users are responsible for their own use of this model. It has the potential to generate content that may be considered harmful, explicit, or offensive. I encourage developers to implement appropriate safeguards and ethical considerations in their downstream applications.

	## Summary of Architectural Modifications
	(For a full breakdown, tech report soon-ish.)

	* 12B → 8.9B Parameters:
	* TL;DR: I replaced a 3.3B parameter timestep-encoding layer with a more efficient 250M parameter FFN, as the original was vastly oversized for its task.
	* MMDiT Masking:
	* TL;DR: Masking T5 padding tokens enhanced fidelity and increased training stability by preventing the model from focusing on irrelevant `<pad>` tokens.
	* Custom Timestep Distributions:
	* TL;DR: I implemented a custom timestep sampling distribution (`-x^2`) to prevent loss spikes and ensure the model trains effectively on both high-noise and low-noise regions.

	## P.S
	Chroma1-HD is Chroma-v.50

	## Citation
	```
	@misc{rock2025chroma,
	author = {Lodestone Rock},
	title = {Chroma1-HD},
	year = {2025},
	publisher = {Hugging Face},
	journal = {Hugging Face repository},
	howpublished = {\url{https://huggingface.co/lodestones/Chroma1-HD}},
	}
	```