metadata

license: apache-2.0
base_model:
  - Alpha-VLLM/Lumina-Image-2.0

Illustrious-Lumina-v0.03

This model is based on Alpha-VLLM/Lumina-Image-2.0 , which is nice small DiT model with minimal guaranteed functionality! Please refer to https://github.com/Alpha-VLLM/Lumina-Image-2.0 for official repository. Paper

Before we dive into the details of 'Illustrious-Lumina-v0.03', we’re excited to share that you can now generate images directly with our Illustrious XL models on our official site: illustrious-xl.ai.

We’ve launched a full image generation platform featuring high-res outputs, natural language prompting, and custom presets - plus, several exclusive models you won’t find on any other hub.

Explore our updated model tiers and naming here: Model Series.

Need help getting started? Check out our generation user guide: ILXL Image Generation User Guide.

1. Model Overview

Architecture: 2 B parameters DiT.
Text Encoder: Pure LLM, **Gemma-2-2b **
Goal of this fork: We test if the image backbone can learn illustration concepts without re‑training the LLM component.

Illustrious-Lumina-v0.03 is experimental epoch of Lumina-2.0 based training session, to validate whether we would be able to achieve small DiT model just with LLM - to be trained as illustration-focused model. The original model, is unfortunately bad at illustrations and lacked any of the knowledge - so the run focused on training abscent knowledges.

After 26,500 step, the model, Illustrious-Lumina-v0.03 has show successful fast adaptation toward the dataset.

However, please note that the original model is not good at illustrations, whileas our focus is only in illustrations - this would take a while to reach the certain level.

The examples are ready in Blog post.

To test the model, please refer to the huggingface space

If you prefer to run model locally, please use the pth file with official installation guide. The safetensors file is meant to only "contain the weights" - for comfyui-compatible format, we will try to prepare it as soon as possible.

2. Training Setup

Item	Value
Images Seen Total	22 M image–text pairs
Steps	26 500
Global batch	768
Resolution	1024, 256
Checkpoint	`Illustrious_Lumina_2b_22100_ema_unified_fp32.safetensors`

The model has seen 22M image-text pairs. To accelerate the training, multi-resolution training was utilized.

3. Inference Demo Code

If you prefer to run model locally, please use the pth file with official installation guide.

The setup used for header image can be replicated with following setup:

4. Disclaimer

The model does not reflect any final product, and intended to be used for research analysis only. The model is not production-ready; use as own risk.

The model is in Proof Of Concept stage- supposedly, 3% of the compute required for full training, with only 22M samples seen with low-resolution joint training, with A6000 GPUs.

For training acceleration, please consider supporting us in Support site!