OnomaAIResearch
/

Illustrious-Lumina-v0.03

Model card Files Files and versions Community

Illustrious-Lumina-v0.03 / README.md

AngelBottomless

Update README.md

5bd24a5 verified 4 days ago

preview code

raw

history blame

3.75 kB

	---
	license: apache-2.0
	base_model:
	- Alpha-VLLM/Lumina-Image-2.0
	---
	# Illustrious-Lumina-v0.03

	This model is based on Alpha-VLLM/Lumina-Image-2.0 , which is nice small DiT model with minimal guaranteed functionality! Please refer to https://github.com/Alpha-VLLM/Lumina-Image-2.0 for official repository.
	[Paper](https://arxiv.org/abs/2503.21758)

	---
	Before we dive into the details of 'Illustrious-Lumina-v0.03', we’re excited to share that you can now generate images directly with our Illustrious XL models on our official site: [illustrious-xl.ai](http://illustrious-xl.ai/).

	We’ve launched a full image generation platform featuring high-res outputs, natural language prompting, and custom presets - plus, several exclusive models you won’t find on any other hub.

	Explore our updated model tiers and naming here: [Model Series](https://www.illustrious-xl.ai/updates/20).

	Need help getting started? Check out our generation user guide: [ILXL Image Generation User Guide](https://www.illustrious-xl.ai/updates/21).

	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/63398de08f27255b6b50081a/OamvrbyYicsGvp2ShVoaq.png)

	## 1. Model Overview
	- Architecture: 2 B parameters DiT.
	- Text Encoder: Pure LLM, Gemma-2-2b
	- Goal of this fork: We test if the image backbone can learn illustration concepts without re‑training the LLM component.

	---
	Illustrious-Lumina-v0.03 is experimental epoch of Lumina-2.0 based training session, to validate whether we would be able to achieve small DiT model just with LLM - to be trained as illustration-focused model.
	The original model, is unfortunately bad at illustrations and lacked any of the knowledge - so the run focused on training abscent knowledges.

	After 26,500 step, the model, Illustrious-Lumina-v0.03 has show successful fast adaptation toward the dataset.

	However, please note that the original model is not good at illustrations, whileas our focus is only in illustrations - this would take a while to reach the certain level.

	The examples are ready in [Blog post](https://www.illustrious-xl.ai/blog).

	To test the model, please refer to the [huggingface space](https://huggingface.co/spaces/AngelBottomless/Lumina-Illustrious-v0.03)

	If you prefer to run model locally, please use the pth file with [official installation guide](https://github.com/OnomaAI/Illustrious-Lumina).
	The safetensors file is meant to only "contain the weights" - for comfyui-compatible format, we will try to prepare it as soon as possible.


	## 2. Training Setup
	\| Item \| Value \|
	\|------\|-------\|
	\| Images Seen Total \| 22 M image–text pairs \|
	\| Steps \| 26 500 \|
	\| Global batch \| 768 \|
	\| Resolution \| 1024, 256 \|
	\| Checkpoint \| `Illustrious_Lumina_2b_22100_ema_unified_fp32.safetensors` \|

	The model has seen 22M image-text pairs. To accelerate the training, multi-resolution training was utilized.

	## 3. Inference Demo Code
	If you prefer to run model locally, please use the pth file with [official installation guide](https://github.com/OnomaAI/Illustrious-Lumina).


	The setup used for header image can be replicated with following setup:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/63398de08f27255b6b50081a/qmFDOkCiu-gjG4X0r2ydO.png)

	## 4. Disclaimer
	The model does not reflect any final product, and intended to be used for research analysis only. The model is not production-ready; use as own risk.

	The model is in Proof Of Concept stage- supposedly, 3% of the compute required for full training, with only 22M samples seen with low-resolution joint training, with A6000 GPUs.

	For training acceleration, please consider supporting us in [Support site](https://illustrious-xl.ai/model/17)!