|
--- |
|
license: apache-2.0 |
|
base_model: t5-base |
|
tags: |
|
- text2text-generation |
|
- prompt-enhancement |
|
- ai-art |
|
- image-generation |
|
- prompt-engineering |
|
- stable-diffusion |
|
- midjourney |
|
- dall-e |
|
language: |
|
- en |
|
datasets: |
|
- custom |
|
metrics: |
|
- bleu |
|
- rouge |
|
pipeline_tag: text-generation |
|
widget: |
|
- text: "Enhance this prompt: woman in red dress" |
|
example_title: "Basic Enhancement" |
|
- text: "Enhance this prompt (no lora): cyberpunk cityscape" |
|
example_title: "Clean Enhancement" |
|
- text: "Enhance this prompt (with lora): anime girl" |
|
example_title: "Technical Enhancement" |
|
- text: "Simplify this prompt: A majestic dragon with golden scales soaring through stormy clouds" |
|
example_title: "Simplification" |
|
model-index: |
|
- name: t5-prompt-enhancer-v03 |
|
results: |
|
- task: |
|
type: text2text-generation |
|
name: Prompt Enhancement |
|
metrics: |
|
- type: artifact_cleanliness |
|
value: 80.0 |
|
name: Clean Output Rate |
|
- type: instruction_coverage |
|
value: 4 |
|
name: Instruction Types |
|
--- |
|
|
|
# ๐จ T5 Prompt Enhancer V0.3 |
|
|
|
**The most advanced AI art prompt enhancement model with quad-instruction capability and LoRA control.** |
|
|
|
Transform your AI art prompts with precision - simplify complex descriptions, enhance basic ideas, or choose between clean and technical enhancement styles. |
|
|
|
## ๐ Quick Start |
|
|
|
```python |
|
from transformers import T5Tokenizer, T5ForConditionalGeneration |
|
import torch |
|
|
|
# Load model |
|
model = T5ForConditionalGeneration.from_pretrained("t5-prompt-enhancer-v03") |
|
tokenizer = T5Tokenizer.from_pretrained("t5-prompt-enhancer-v03") |
|
|
|
def enhance_prompt(text, style="clean"): |
|
"""Enhanced prompt generation with style control""" |
|
|
|
if style == "clean": |
|
prompt = f"Enhance this prompt (no lora): {text}" |
|
elif style == "technical": |
|
prompt = f"Enhance this prompt (with lora): {text}" |
|
elif style == "simplify": |
|
prompt = f"Simplify this prompt: {text}" |
|
else: |
|
prompt = f"Enhance this prompt: {text}" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt", max_length=256, truncation=True) |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
inputs.input_ids, |
|
max_length=80, |
|
num_beams=2, |
|
repetition_penalty=2.0, |
|
no_repeat_ngram_size=3 |
|
) |
|
|
|
return tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
# Examples |
|
print(enhance_prompt("woman in red dress", "clean")) |
|
# Output: "a beautiful woman in a red dress with flowing hair, elegant pose, soft lighting" |
|
|
|
print(enhance_prompt("anime girl", "technical")) |
|
# Output: "masterpiece, best quality, 1girl, solo, anime style, detailed background" |
|
|
|
print(enhance_prompt("A majestic dragon with golden scales soaring through stormy clouds", "simplify")) |
|
# Output: "dragon flying through clouds" |
|
``` |
|
|
|
## โจ Key Features |
|
|
|
### ๐ **Quad-Instruction Capability** |
|
- **Simplify:** Reduce complex prompts to essential elements |
|
- **Enhance:** Standard prompt improvement with balanced detail |
|
- **Enhance (no lora):** Clean enhancement without technical artifacts |
|
- **Enhance (with lora):** Technical enhancement with LoRA tags and quality descriptors |
|
|
|
### ๐ฏ **Precision Control** |
|
- Choose exactly the enhancement style you need |
|
- Clean outputs for general use |
|
- Technical outputs for advanced AI art workflows |
|
- Bidirectional transformation (complex โ simple) |
|
|
|
### ๐ **Training Excellence** |
|
- **297K training samples** from 6 major AI art platforms |
|
- **Subject diversity protection** prevents AI art bias |
|
- **Platform-balanced training** across Lexica, CGDream, Civitai, NightCafe, Kling, OpenArt |
|
- **Smart data utilization** - uses both original and cleaned versions of prompts |
|
|
|
## ๐ญ Model Capabilities |
|
|
|
### Enhancement Examples |
|
|
|
| Input | Output Style | Result | |
|
|-------|-------------|---------| |
|
| "woman in red dress" | **Clean** | "a beautiful woman in a red dress with flowing hair, elegant pose, soft lighting" | |
|
| "woman in red dress" | **Technical** | "masterpiece, best quality, 1girl, solo, red dress, detailed background, high resolution" | |
|
| "Complex Victorian description..." | **Simplify** | "woman in red dress in ballroom" | |
|
| "cat" | **Standard** | "cat sitting peacefully, photorealistic, detailed fur texture" | |
|
|
|
### Instruction Format |
|
|
|
```python |
|
# Four supported instruction types: |
|
"Enhance this prompt: {basic_prompt}" # Balanced enhancement |
|
"Enhance this prompt (no lora): {basic_prompt}" # Clean, artifact-free |
|
"Enhance this prompt (with lora): {basic_prompt}" # Technical with LoRA tags |
|
"Simplify this prompt: {complex_prompt}" # Complexity reduction |
|
``` |
|
|
|
## ๐ Performance Metrics |
|
|
|
### Training Statistics |
|
- **Training Samples:** 297,282 (filtered from 316K) |
|
- **Training Time:** 131 hours on RTX 3060 |
|
- **Final Loss:** 3.66 |
|
- **Model Size:** 222M parameters |
|
- **Vocabulary:** 32,104 tokens |
|
|
|
### Instruction Distribution |
|
- **Enhance (no lora):** 32.6% (96,934 samples) |
|
- **Enhance (standard):** 32.6% (96,907 samples) |
|
- **Simplify:** 29.5% (87,553 samples) |
|
- **Enhance (with lora):** 5.3% (15,888 samples) |
|
|
|
### Platform Coverage |
|
- **CGDream:** 94,362 samples (31.7%) |
|
- **Lexica:** 75,142 samples (25.3%) |
|
- **Civitai:** 66,880 samples (22.5%) |
|
- **NightCafe:** 49,881 samples (16.8%) |
|
- **Kling:** 10,179 samples (3.4%) |
|
- **OpenArt:** 838 samples (0.3%) |
|
|
|
## ๐ฏ Use Cases |
|
|
|
### For Content Creators |
|
```python |
|
# Simplify complex prompts for broader audiences |
|
enhance_prompt("masterpiece, ultra-detailed render of cyberpunk scene...", "simplify") |
|
# โ "cyberpunk city street at night" |
|
``` |
|
|
|
### For AI Artists |
|
```python |
|
# Clean enhancement for professional work |
|
enhance_prompt("sunset landscape", "clean") |
|
# โ "breathtaking sunset over rolling hills with golden light and dramatic clouds" |
|
|
|
# Technical enhancement for specific workflows |
|
enhance_prompt("anime character", "technical") |
|
# โ "masterpiece, best quality, 1girl, solo, anime style, detailed background" |
|
``` |
|
|
|
### For Prompt Engineers |
|
```python |
|
# Bidirectional optimization |
|
basic = "cat on chair" |
|
enhanced = enhance_prompt(basic, "clean") |
|
simplified = enhance_prompt(enhanced, "simplify") |
|
# Optimize prompt complexity iteratively |
|
``` |
|
|
|
## ๐ง Advanced Usage |
|
|
|
### Custom Generation Parameters |
|
```python |
|
def generate_with_control(text, style="clean", creativity=0.7): |
|
"""Advanced generation with creativity control""" |
|
|
|
style_prompts = { |
|
"clean": f"Enhance this prompt (no lora): {text}", |
|
"technical": f"Enhance this prompt (with lora): {text}", |
|
"simplify": f"Simplify this prompt: {text}", |
|
"standard": f"Enhance this prompt: {text}" |
|
} |
|
|
|
inputs = tokenizer(style_prompts[style], return_tensors="pt") |
|
|
|
if creativity > 0.5: |
|
# Creative mode |
|
outputs = model.generate( |
|
inputs.input_ids, |
|
max_length=100, |
|
do_sample=True, |
|
temperature=creativity, |
|
top_p=0.9, |
|
repetition_penalty=1.5 |
|
) |
|
else: |
|
# Deterministic mode |
|
outputs = model.generate( |
|
inputs.input_ids, |
|
max_length=80, |
|
num_beams=2, |
|
repetition_penalty=2.0, |
|
no_repeat_ngram_size=3 |
|
) |
|
|
|
return tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
``` |
|
|
|
### Batch Processing |
|
```python |
|
def batch_enhance(prompts, style="clean"): |
|
"""Process multiple prompts efficiently""" |
|
|
|
prefixed_prompts = [f"Enhance this prompt ({style}): {prompt}" if style in ["no lora", "with lora"] |
|
else f"Enhance this prompt: {prompt}" for prompt in prompts] |
|
|
|
inputs = tokenizer(prefixed_prompts, return_tensors="pt", padding=True, truncation=True) |
|
|
|
outputs = model.generate( |
|
inputs.input_ids, |
|
max_length=80, |
|
num_beams=2, |
|
repetition_penalty=2.0, |
|
pad_token_id=tokenizer.pad_token_id |
|
) |
|
|
|
return [tokenizer.decode(output, skip_special_tokens=True) for output in outputs] |
|
``` |
|
|
|
## ๐ Model Comparison |
|
|
|
| Feature | V0.1 | V0.2 | **V0.3** | |
|
|---------|------|------|----------| |
|
| **Training Data** | 48K | 174K | **297K** | |
|
| **Instructions** | Enhancement only | Simplify + Enhance | **Quad-instruction** | |
|
| **LoRA Handling** | Contaminated | Contaminated | **Controlled** | |
|
| **Artifact Control** | None | None | **Explicit** | |
|
| **Platform Coverage** | Limited | Good | **Comprehensive** | |
|
| **User Control** | Basic | Moderate | **Complete** | |
|
|
|
## ๐ ๏ธ Technical Details |
|
|
|
### Architecture |
|
- **Base Model:** T5-base (Google) |
|
- **Parameters:** 222,885,120 |
|
- **Special Tokens:** `<simplify>`, `<enhance>`, `<no_lora>`, `<with_lora>` |
|
- **Max Input Length:** 256 tokens |
|
- **Max Output Length:** 512 tokens |
|
|
|
### Training Configuration |
|
- **Epochs:** 3 |
|
- **Batch Size:** 8 per device (effective: 16 with gradient accumulation) |
|
- **Learning Rate:** 3e-4 with cosine scheduling |
|
- **Optimization:** FP16 mixed precision, gradient checkpointing |
|
- **Hardware:** Trained on RTX 3060 (131 hours) |
|
|
|
### Data Sources |
|
Training data collected from: |
|
- **Lexica** - Stable Diffusion prompt database |
|
- **CGDream** - AI art community platform |
|
- **Civitai** - Model sharing and prompt community |
|
- **NightCafe** - AI art creation platform |
|
- **Kling AI** - Text-to-image generation service |
|
- **OpenArt** - AI art discovery platform |
|
|
|
## โ๏ธ Recommended Parameters |
|
|
|
### For Consistent Results |
|
```python |
|
generation_config = { |
|
"max_length": 80, |
|
"num_beams": 2, |
|
"repetition_penalty": 2.0, |
|
"no_repeat_ngram_size": 3 |
|
} |
|
``` |
|
|
|
### For Creative Variation |
|
```python |
|
creative_config = { |
|
"max_length": 100, |
|
"do_sample": True, |
|
"temperature": 0.7, |
|
"top_p": 0.9, |
|
"repetition_penalty": 1.3 |
|
} |
|
``` |
|
|
|
## ๐จ Limitations |
|
|
|
- **English Only:** Trained exclusively on English prompts |
|
- **AI Art Domain:** Specialized for AI art prompts, may not generalize to other domains |
|
- **LoRA Artifacts:** Technical enhancement mode may include platform-specific tags |
|
- **Context Length:** Limited to 256 input tokens |
|
- **Platform Bias:** Training data reflects current AI art platform distributions |
|
|
|
## ๐ Evaluation Results |
|
|
|
### Artifact Cleanliness |
|
- **V0.1:** 100% clean (limited capability) |
|
- **V0.2:** 80% clean (uncontrolled artifacts) |
|
- **V0.3:** 80% clean + **user control** over artifact inclusion |
|
|
|
### Instruction Coverage |
|
- **Simplification:** โ
Excellent (V0.2 level performance) |
|
- **Standard Enhancement:** โ
Good balance of detail and clarity |
|
- **Clean Enhancement:** โ
No technical artifacts when requested |
|
- **Technical Enhancement:** โ
Proper LoRA tags when requested |
|
|
|
## ๐จ Example Workflows |
|
|
|
### Content Creator Workflow |
|
```python |
|
# Start with basic idea |
|
idea = "fantasy castle" |
|
|
|
# Create clean version for general audience |
|
clean_version = enhance_prompt(idea, "clean") |
|
# โ "A majestic fantasy castle with towering spires and magical aura" |
|
|
|
# Create detailed version for AI art generation |
|
detailed_version = enhance_prompt(idea, "technical") |
|
# โ "masterpiece, fantasy castle, detailed architecture, magical atmosphere, high quality" |
|
``` |
|
|
|
### Prompt Engineering Workflow |
|
```python |
|
# Iterative refinement |
|
original = "A complex, detailed description of a beautiful woman..." |
|
simplified = enhance_prompt(original, "simplify") |
|
# โ "beautiful woman portrait" |
|
|
|
refined = enhance_prompt(simplified, "clean") |
|
# โ "elegant woman portrait with soft lighting and natural beauty" |
|
``` |
|
|
|
## ๐ Training Data Details |
|
|
|
### Subject Diversity Protection |
|
Applied during training to prevent AI art bias: |
|
- Female subjects: 20% max (reduced from typical 35%+ in raw data) |
|
- "Beautiful" descriptor: 6% max |
|
- Anime style: 10% max |
|
- Dress/clothing focus: 8% max |
|
- LoRA contaminated samples: 15% max |
|
|
|
### Data Processing Pipeline |
|
1. **Collection:** Multi-platform scraping with quality filtering |
|
2. **Cleaning:** LoRA artifact detection and removal |
|
3. **Enhancement:** BLIP2 visual captioning for training pairs |
|
4. **Protection:** Subject diversity sampling to prevent bias |
|
5. **Balancing:** Equal distribution across instruction types |
|
|
|
## ๐ฌ Research Applications |
|
|
|
### Prompt Engineering Research |
|
- Systematic prompt transformation studies |
|
- Enhancement vs simplification trade-offs |
|
- Cross-platform prompt adaptation |
|
|
|
### AI Art Bias Studies |
|
- Diversity-protected training methodologies |
|
- Platform-specific prompt pattern analysis |
|
- Controlled artifact generation studies |
|
|
|
### Multi-Modal AI Research |
|
- Text-to-image prompt optimization |
|
- Cross-modal content adaptation |
|
- User preference modeling for prompt styles |
|
|
|
## ๐ Citation |
|
|
|
```bibtex |
|
@model{t5_prompt_enhancer_v03, |
|
title={T5 Prompt Enhancer V0.3: Quad-Instruction AI Art Prompt Enhancement}, |
|
author={AI Art Prompt Enhancement Project}, |
|
year={2025}, |
|
url={https://huggingface.co/t5-prompt-enhancer-v03}, |
|
note={T5-base model fine-tuned for quad-instruction AI art prompt enhancement with LoRA control}, |
|
training_data={297K samples from 6 AI art platforms}, |
|
capabilities={simplification, enhancement, lora_control, artifact_cleaning} |
|
} |
|
``` |
|
|
|
## ๐ค Community |
|
|
|
### Contributing |
|
- **Data Quality:** Help improve training data quality |
|
- **Evaluation:** Contribute evaluation prompts and test cases |
|
- **Multi-language:** Expand to non-English prompts |
|
- **Platform Coverage:** Add new AI art platforms |
|
|
|
### Support |
|
- **Issues:** Report bugs and feature requests |
|
- **Discussions:** Share use cases and improvements |
|
- **Examples:** Contribute workflow examples |
|
|
|
## ๐ฏ Version History |
|
|
|
### V0.3 (Current) - September 2025 |
|
- โ
Quad-instruction capability (4 instruction types) |
|
- โ
LoRA artifact control |
|
- โ
297K training samples with diversity protection |
|
- โ
Enhanced platform coverage |
|
- โ
Smart data utilization (original + cleaned versions) |
|
|
|
### V0.2 - August 2025 |
|
- โ
Bidirectional capability (simplify + enhance) |
|
- โ
174K training samples |
|
- โ ๏ธ Uncontrolled LoRA artifacts |
|
|
|
### V0.1 - July 2025 |
|
- โ
Basic enhancement capability |
|
- โ
48K training samples |
|
- โ Enhancement only, no simplification |
|
|
|
## ๐ฎ Future Roadmap |
|
|
|
### V0.4 (Planned) |
|
- [ ] Multi-language support (Spanish, French, German) |
|
- [ ] Style-specific enhancement (realistic, anime, artistic) |
|
- [ ] Platform-aware generation |
|
- [ ] Quality scoring integration |
|
|
|
### V0.5 (Future) |
|
- [ ] Multi-modal input support |
|
- [ ] Real-time prompt optimization |
|
- [ ] User preference learning |
|
- [ ] Cross-platform prompt translation |
|
|
|
## ๐ Performance Benchmarks |
|
|
|
### Speed |
|
- **Inference Time:** ~0.5-2.0 seconds per prompt (RTX 3060) |
|
- **Memory Usage:** ~2GB VRAM for inference |
|
- **Throughput:** ~30-60 prompts/minute depending on complexity |
|
|
|
### Quality Metrics |
|
- **Simplification Accuracy:** 95%+ core element preservation |
|
- **Enhancement Quality:** Rich detail addition without over-complication |
|
- **Artifact Control:** 80%+ clean outputs when requested |
|
- **Instruction Following:** 98%+ correct instruction interpretation |
|
|
|
## ๐ท๏ธ Tags |
|
|
|
`text2text-generation` `prompt-enhancement` `ai-art` `stable-diffusion` `midjourney` `dall-e` `prompt-engineering` `lora-control` `bidirectional` `artifact-cleaning` |
|
|
|
--- |
|
|
|
**๐จ Built for the AI art community - Transform your prompts with precision and control!** |
|
|
|
*Model trained with โค๏ธ for creators, artists, and prompt engineers worldwide.* |