ShinnosukeU commited on
Commit
53ef603
·
verified ·
1 Parent(s): b3cc4d3

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. .gitignore +11 -0
  2. .python-version +1 -0
  3. README.md +2 -8
  4. app.py +143 -0
  5. pyproject.toml +11 -0
  6. uv.lock +0 -0
.gitignore ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ venv/
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ *.pyd
6
+ .env
7
+ .vscode/
8
+ .DS_Store
9
+ *.log
10
+ .venv/
11
+ .gradio/
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.12.7
README.md CHANGED
@@ -1,12 +1,6 @@
1
  ---
2
- title: Diffusion Prompt Generator
3
- emoji: 📈
4
- colorFrom: indigo
5
- colorTo: blue
6
  sdk: gradio
7
  sdk_version: 5.45.0
8
- app_file: app.py
9
- pinned: false
10
  ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: diffusion-prompt-generator
3
+ app_file: app.py
 
 
4
  sdk: gradio
5
  sdk_version: 5.45.0
 
 
6
  ---
 
 
app.py ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+ from io import BytesIO
3
+ from textwrap import dedent
4
+
5
+ import gradio as gr
6
+ import jinja2
7
+ from openai import OpenAI
8
+
9
+ client = OpenAI()
10
+
11
+
12
+ GENERAL_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for cinematic-style image generation.
13
+
14
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
15
+
16
+ Emphasizing dynamic and engaging editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subject—examples like architecture, nature, lighting, or abstract forms can inspire but do not need to be used literally. Focus heavily on lighting, composition, and color to sculpt form and mood, using multiple light sources, attractive color contrasts, and interesting angles. Choose the artistic style, color grading, and atmosphere that best enhance the subject and context of the prompt, creating a cohesive and visually compelling image without being constrained to any particular existing style. Use a photorealistic style. Make sure that the background is very cool and suits the prompt. Make sure that the prompt is very aesthetic, creative and vivid.
17
+
18
+ Tips:
19
+ - Make sure prompt is not too long.
20
+ - Only include facial features of the subject in the prompt from the photo. Ignore the background or the clothes of the subject in the photo.
21
+ - Use dynamic camera angles and poses if appropriate.
22
+
23
+ Examples:
24
+ Input: A photo of me in a race bib
25
+ Input photo: Black man
26
+ Output prompt: A stylized, cinematic portrait of a Black man captured from the chest up, set against a
27
+ glowing deep red background. The image is tightly framed in vertical format, emphasizing his
28
+ upper torso, neck, and face in moody, directional light. He wears a torn black tank top with
29
+ rugged edges and a marathon race bib pinned to the front. Around his neck hangs a thin silver chain. His hair is
30
+ styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, engraved across the lens — subtly visible in the reflections. The lighting is
31
+ soft but focused, casting strong shadow contours along his collarbone and highlighting the
32
+ reflective elements of both glasses and sweat on his skin. The mood is intense and editorial
33
+ — a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
34
+ rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
35
+ eyewear and red glow turn the portrait into a branded fashion statement.
36
+
37
+ Why the output is good:
38
+ - The detailed styling (torn tank top, race bib, metallic sunglasses)
39
+ - Specific lighting directions (soft but focused, shadow contours) shape the mood.
40
+
41
+ Input: A photo of me in a pool
42
+ Input photo: A muscular man
43
+ Output prompt: A top-down editorial photo of a muscular man falling off a bright pink inflatable pool float,
44
+ mid-fall with his body twisting toward the water. He wears black swim shorts and silver
45
+ Oakley sunglasses. His arms are flailing slightly, and water droplets hang frozen in the air
46
+ around him, hit by harsh flash. The float is distorted by motion, and splash trails from his legs
47
+ as they hit the surface. The pool is a sunlit turquoise, with subtle tile reflection and lens
48
+ specks near the corners. There's bloom from the water highlights, and the entire shot has an
49
+ analog, fashion-campaign feel with no visible grain. Use a Photorealistic Style. Resolution
50
+ 1792x1024. Fisheye! Motion blur
51
+
52
+ Why the output is good:
53
+ - Unique perspective (top-down) combined with dynamic action (falling off,
54
+ mid-fall, twisting, flailing).
55
+ - Specifies analog, fashion-campaign feel but requests no visible grain, guiding the texture.
56
+ - Adding Fisheye and Motion blur at the end reinforces these key elements.
57
+
58
+ Input: A photo of me as Batman
59
+ Input photo: Asian man
60
+ Portrait of asian man as Batman in the style of Rembrandt black and white, chiaroscuro lighting, deep shadows, and luminous highlights. His face emerges from darkness, one eye catching a sliver of light, the other lost in shadow. The cowl is rendered like aged leather, with thick, textured brushstrokes and visible impasto. The Batsymbol is faint, almost erased, as if worn by time. Background: void of form, only grain and darkness. Style: baroque oil painting translated to monochrome — dramatic, emotional
61
+
62
+ Why the output is good:
63
+ - The overall style fits the theme of the Batman.
64
+
65
+ HERE is the user's prompt:
66
+ {{ user_prompt }}
67
+ """)
68
+
69
+ def process_prompt(image, target_label, user_prompt, style):
70
+ image_url = None
71
+
72
+ buffer = BytesIO()
73
+ image.convert("RGB").save(buffer, format="JPEG", quality=90)
74
+ b64_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
75
+ image_url = f"data:image/jpeg;base64,{b64_image}"
76
+
77
+ if style == "Chromatic Cinematic":
78
+ system_content = """You are an expert prompt engineer for chromatic cinematic-style image generation. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with strong contrast and aesthetic color grading such as Wes Anderson. Frame close to the camera so the subject is immediately recognizable, emphasizing dynamic and exaggerated editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subject—examples like architectural beams, diagonal staircases, waves, or shadows can inspire but do not need to be used literally. Focus heavily on lighting to sculpt the form and mood, using two lighting sources from different directions, attractive color combinations, and interesting lighting angles (e.g., dramatic diagonal or overhead from the top-left corner). When referencing a style like Wes Anderson, describe the scene, composition, or color grading (e.g., bold symmetry, saturated pastels) without simply copying his visuals. Use a photorealistic style. Resolution 1792x1024."""
79
+
80
+ user_content = (
81
+ f"Use the uploaded image to infer the subject's appearance attribtues. Instead of refercing pronouns in the prompt (i.e. me/she siting on a chair), use the attributes to describe the subjet (i.e. the man with the glasses sitting on the chair). "
82
+ f"Then transform this prompt into a detailed chromatic cinematic style description: User's prompt: {user_prompt}"
83
+ )
84
+ elif style == "Film Noir":
85
+ system_content = "You are an expert prompt engineer for cinematic-style image generation in the film noir aesthetic. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with high contrast, deep shadows, and moody lighting characteristic of classic noir. Frame close to the camera so the subject is immediately recognizable, emphasizing tense, dramatic, or expressive editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subject—examples like rain-slicked streets, lampposts casting long shadows, Venetian blinds, or fog can inspire but do not need to be used literally. Focus heavily on lighting to sculpt form and mood, using harsh key lights, soft fill lights, and strong directional shadows to create tension and depth. When referencing a style like film noir, describe the scene, composition, or tonal contrasts (e.g., stark black-and-white contrasts, smoky atmospheres, reflective wet surfaces) without simply copying existing visuals. Use a photorealistic style. Resolution 1792x1024."
86
+ user_content = (
87
+ "Use the uploaded image to infer the subject's appearance and incorporate accurate descriptors. "
88
+ f"User's prompt: {user_prompt}"
89
+ )
90
+
91
+ elif style == "General":
92
+ system_content = "You are expert prompt engineer"
93
+ user_content = GENERAL_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
94
+
95
+
96
+ response = client.responses.create(
97
+ model="gpt-5",
98
+ reasoning={"effort": "low"},
99
+ input=[
100
+ {
101
+ "role": "system",
102
+ "content": system_content
103
+ },
104
+ {
105
+ "role": "user",
106
+ "content": [
107
+ {"type": "input_text", "text": user_content},
108
+ {"type": "input_image", "image_url": image_url}
109
+ ]
110
+ }
111
+ ],
112
+ )
113
+ return f"{response.output_text} {target_label.strip()}"
114
+
115
+ demo = gr.Interface(
116
+ fn=process_prompt,
117
+ inputs=[
118
+ gr.Image(
119
+ label="Upload reference image",
120
+ type="pil",
121
+ ),
122
+ gr.Textbox(
123
+ label="Enter target label",
124
+ placeholder="SMRA",
125
+ ),
126
+ gr.Textbox(
127
+ label="Enter your prompt",
128
+ placeholder="picture of me while sitting in a chair in the ocean",
129
+ ),
130
+ gr.Dropdown(
131
+ choices=["General"],
132
+ #choices=["Chromatic Cinematic", "Neon Noir", "General"],
133
+ label="Style Selection",
134
+ info="Choose the visual style for your enhanced prompt"
135
+ ),
136
+ ],
137
+ outputs=gr.Textbox(
138
+ label="Style Prompt",
139
+ lines=20,
140
+ ),
141
+ )
142
+
143
+ demo.launch()
pyproject.toml ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "prompt-aesthics"
3
+ version = "0.1.0"
4
+ description = "Add your description here"
5
+ readme = "README.md"
6
+ requires-python = ">=3.12.7"
7
+ dependencies = [
8
+ "gradio>=5.45.0",
9
+ "jinja2>=3.1.6",
10
+ "openai>=1.107.1",
11
+ ]
uv.lock ADDED
The diff for this file is too large to render. See raw diff