Chroma1-HD / README.md
lodestones's picture
Update README.md
de64031 verified
|
raw
history blame
4.9 kB
metadata
license: apache-2.0
pipeline_tag: text-to-image

Chroma1-HD

Chroma1-HD is an 8.9B parameter text-to-image foundational model based on FLUX.1-schnell. It is fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build upon it.

As a base model, Chroma1 is intentionally designed to be an excellent starting point for finetuning. It provides a strong, neutral foundation for developers, researchers, and artists to create specialized models.

Key Features

  • High-Performance Base: 8.9B parameters, built on the powerful FLUX.1 architecture.
  • Easily Finetunable: Designed as an ideal checkpoint for creating custom, specialized models.
  • Community-Driven & Open-Source: Fully transparent with an Apache 2.0 license, and training history.
  • Flexible by Design: Provides a flexible foundation for a wide range of generative tasks.

How to Use

diffusers Library

import torch
from diffusers import ChromaPipeline

pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-HD", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = [
    "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=torch.Generator("cpu").manual_seed(433),
    num_inference_steps=40,
    guidance_scale=3.0,
    num_images_per_prompt=1,
).images[0]
image.save("chroma.png")

ComfyUI For advanced users and customized workflows, you can use Chroma with ComfyUI.

Requirements:

Setup:

  1. Place the T5_xxl model in your ComfyUI/models/clip folder.
  2. Place the FLUX VAE in your ComfyUI/models/vae folder.
  3. Place the Chroma checkpoint in your ComfyUI/models/diffusion_models folder.
  4. Load the Chroma workflow file into ComfyUI and run.

Model Details

  • Architecture: Based on the 8.9B parameter FLUX.1-schnell model.
  • Training Data: Trained on a 5M sample dataset curated from a 20M pool, including artistic, photographic, and niche styles.
  • Technical Report: A comprehensive technical paper detailing the architectural modifications and training process is forthcoming.

Intended Use

Chroma is intended to be used as a base model for researchers and developers to build upon. It is ideal for:

  • Finetuning on specific styles, concepts, or characters.
  • Research into generative model behavior, alignment, and safety.
  • As a foundational component in larger AI systems.

Limitations and Bias Statement

Chroma is trained on a broad, filtered dataset from the internet. As such, it may reflect the biases and stereotypes present in its training data. The model is released in an uncensored state and has not been aligned with a specific safety filter.

Users are responsible for their own use of this model. It has the potential to generate content that may be considered harmful, explicit, or offensive. I encourage developers to implement appropriate safeguards and ethical considerations in their downstream applications.

Summary of Architectural Modifications

(For a full breakdown, tech report soon-ish.)

  • 12B → 8.9B Parameters:
    • TL;DR: I replaced a 3.3B parameter timestep-encoding layer with a more efficient 250M parameter FFN, as the original was vastly oversized for its task.
  • MMDiT Masking:
    • TL;DR: Masking T5 padding tokens enhanced fidelity and increased training stability by preventing the model from focusing on irrelevant <pad> tokens.
  • Custom Timestep Distributions:
    • TL;DR: I implemented a custom timestep sampling distribution (-x^2) to prevent loss spikes and ensure the model trains effectively on both high-noise and low-noise regions.

P.S

Chroma1-HD is Chroma-v.50

Citation

@misc{rock2025chroma,
  author = {Lodestone Rock},
  title = {Chroma1-HD},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/lodestones/Chroma1-HD}},
}