ProteusV0.5
ProteusV0.5 is the latest full release of my AI image generation model, built as a sophisticated enhancement over OpenDalleV1.1. This version brings significant improvements in photorealism, prompt comprehension, and stylistic capabilities across various domains. About Proteus Proteus leverages and enhances the core functionalities of OpenDalleV1.1 to deliver superior outcomes. Key areas of advancement include heightened responsiveness to prompts and augmented creative capacities. The model has been fine-tuned using a carefully curated dataset of copyright-free stock images and high-quality AI-generated image pairs.
Key Improvements in V0.5:
Advanced Custom CLIP Integration:
Incorporates a meticulously trained custom CLIP model
Steadily developed over an extended period
Further fine-tuned for specific qualities in Proteus and Prometheus
Estimated to contribute 90% of the model's performance improvements
Requires a clip skip setting of 2 for optimal performance
Estimated to be responsible for 90% of the improvements in this version
Further Refinement of Stylistic Capabilities:
Enhanced ability to generate diverse artistic styles
Improved coherence in complex scenes and compositions
Expanded Training Dataset:
Now totaling over 400,000 images
Significantly broadened knowledge base and generation capabilities
Balanced Creativity and Accuracy:
Addressed previous issues of being "too stylistic/creative"
Improved alignment between user prompts and generated outputs
Proteus's Background
Proteus serves as a sophisticated enhancement over OpenDalleV1.1, leveraging its core functionalities to deliver superior outcomes. Key areas of advancement include heightened responsiveness to prompts and augmented creative capacities. To achieve this, it was fine-tuned using approximately 220,000 GPTV captioned images from copyright-free stock images (with some anime included), which were then normalized. Additionally, DPO (Direct Preference Optimization) was employed through a collection of 10,000 carefully selected high-quality, AI-generated image pairs. In pursuit of optimal performance, numerous LORA (Low-Rank Adaptation) models are trained independently before being selectively incorporated into the principal model via dynamic application methods. These techniques involve targeting particular segments within the model while avoiding interference with other areas during the learning phase. Consequently, Proteus exhibits marked improvements in portraying intricate facial characteristics and lifelike skin textures, all while sustaining commendable proficiency across various aesthetic domains, notably surrealism, anime, and cartoon-style visualizations.
Training Details
Total training dataset: Now over 400,000 images Initial training: ~220,000 GPTV captioned images from copyright-free stock images (including some anime) Additional training: Hand-picked photorealistic images Fine-tuning: Direct Preference Optimization (DPO) with 10,000 carefully selected high-quality, AI-generated image pairs LORA (Low-Rank Adaptation) models trained independently and selectively incorporated
Improvements
Enhanced portrayal of intricate facial characteristics and lifelike skin textures Improved proficiency in surrealism, anime, and cartoon-style visualizations Superior prompt comprehension due to custom-trained CLIP Expanded dataset leading to more diverse and accurate outputs Refined balance between creativity and accuracy
Recommended Settings
Clip Skip: 2 CFG Scale: 7 Steps: 25 - 50 Sampler: DPM++ 2M SDE Scheduler: Karras Resolution: 1024x1024
The custom-trained CLIP is a significant point of differentiation, as very few models incorporate this feature. Enjoy creating with the fully released ProteusV0.5!
Use it with 🧨 diffusers
import torch
from diffusers import (
StableDiffusionXLPipeline,
KDPM2AncestralDiscreteScheduler,
AutoencoderKL
)
# Load VAE component
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix",
torch_dtype=torch.float16
)
# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"dataautogpt3/ProteusV0.5",
vae=vae,
torch_dtype=torch.float16
)
pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
# Define prompts and generate image
prompt = "a cat wearing sunglasses on the beach"
negative_prompt = ""
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
guidance_scale=7,
num_inference_steps=50,
clip_skip=2
).images[0]
image.save("generated_image.png")
- Downloads last month
- 106