File size: 3,753 Bytes
6b5babd 5bd24a5 6b5babd 5bd24a5 dc39999 82f6edf c5a6d07 82f6edf c5a6d07 dc39999 43897ef 6b5babd e38f104 6b5babd 43897ef 6b5babd 43897ef 6b5babd 43897ef 6b5babd 43897ef 6b5babd 43897ef 6b5babd 43897ef 6b5babd 5bd24a5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
license: apache-2.0
base_model:
- Alpha-VLLM/Lumina-Image-2.0
---
# Illustrious-Lumina-v0.03
This model is based on Alpha-VLLM/Lumina-Image-2.0 , which is nice small DiT model with minimal guaranteed functionality! Please refer to https://github.com/Alpha-VLLM/Lumina-Image-2.0 for official repository.
[Paper](https://arxiv.org/abs/2503.21758)
---
Before we dive into the details of 'Illustrious-Lumina-v0.03', we’re excited to share that you can now generate images directly with our Illustrious XL models on our official site: [illustrious-xl.ai](http://illustrious-xl.ai/).
We’ve launched a full image generation platform featuring high-res outputs, natural language prompting, and custom presets - plus, several exclusive models you won’t find on any other hub.
Explore our updated model tiers and naming here: [Model Series](https://www.illustrious-xl.ai/updates/20).
Need help getting started? Check out our generation user guide: [ILXL Image Generation User Guide](https://www.illustrious-xl.ai/updates/21).
---

## 1. Model Overview
- **Architecture**: **2 B parameters** DiT.
- **Text Encoder**: Pure LLM, **Gemma-2-2b **
- **Goal of this fork**: We test if the image backbone can learn illustration concepts **without** re‑training the LLM component.
---
**Illustrious-Lumina-v0.03** is experimental epoch of Lumina-2.0 based training session, to validate whether we would be able to achieve small DiT model just with LLM - to be trained as illustration-focused model.
The original model, is unfortunately bad at illustrations and lacked any of the knowledge - so the run focused on training abscent knowledges.
After 26,500 step, the model, Illustrious-Lumina-v0.03 has show successful fast adaptation toward the dataset.
However, please note that the original model is not good at illustrations, whileas our focus is only in illustrations - this would take a while to reach the certain level.
The examples are ready in [Blog post](https://www.illustrious-xl.ai/blog).
To test the model, please refer to the [huggingface space](https://huggingface.co/spaces/AngelBottomless/Lumina-Illustrious-v0.03)
If you prefer to run model locally, please use the **pth file** with [official installation guide](https://github.com/OnomaAI/Illustrious-Lumina).
**The safetensors file is meant to only "contain the weights" - for comfyui-compatible format, we will try to prepare it as soon as possible.**
## 2. Training Setup
| Item | Value |
|------|-------|
| Images Seen Total | 22 M image–text pairs |
| Steps | 26 500 |
| Global batch | 768 |
| Resolution | 1024, 256 |
| Checkpoint | `Illustrious_Lumina_2b_22100_ema_unified_fp32.safetensors` |
The model has seen 22M image-text pairs. To accelerate the training, multi-resolution training was utilized.
## 3. Inference Demo Code
If you prefer to run model locally, please use the **pth file** with [official installation guide](https://github.com/OnomaAI/Illustrious-Lumina).
The setup used for header image can be replicated with following setup:

## 4. Disclaimer
The model does not reflect any final product, and intended to be used for research analysis only. The model is not production-ready; use as own risk.
The model is in Proof Of Concept stage- supposedly, 3% of the compute required for full training, with only 22M samples seen with low-resolution joint training, with A6000 GPUs.
For training acceleration, please consider supporting us in [Support site](https://illustrious-xl.ai/model/17)! |