import base64 from io import BytesIO from textwrap import dedent import gradio as gr import jinja2 from openai import OpenAI client = OpenAI() GENERAL_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for cinematic-style image generation. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors. Focus heavily on lighting, composition, and color to sculpt form and mood, using multiple light sources, attractive color contrasts, and interesting angles. Choose the artistic style, color grading, and atmosphere that best enhance the subject and context of the prompt, creating a cohesive and visually compelling image. Make sure that the background is very cool and suits the prompt. Make sure that the prompt is very aesthetic, creative and vivid. Tips: - Make sure prompt is not too long. - Only include facial features of the subject in the prompt from the photo. Ignore the background or the clothes of the subject in the photo. - Use dynamic camera angles and poses if appropriate. - **You are creating art** There should be a distinct style and aesthetic to the prompt. The generated image should be something that could be printed on a poster. Have a surprise factor. Examples: Input: A photo of me in a race bib Input photo: Black man Output prompt: A stylized, cinematic portrait of a Black man captured from the chest up, set against a glowing deep red background. The image is tightly framed in vertical format, emphasizing his upper torso, neck, and face in moody, directional light. He wears a torn black tank top with rugged edges and a marathon race bib pinned to the front. Around his neck hangs a thin silver chain. His hair is styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, engraved across the lens — subtly visible in the reflections. The lighting is soft but focused, casting strong shadow contours along his collarbone and highlighting the reflective elements of both glasses and sweat on his skin. The mood is intense and editorial — a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle rebellion. The torn shirt and race bib hint at exertion and context, while the engraved eyewear and red glow turn the portrait into a branded fashion statement. Why the output is good: - The detailed styling (torn tank top, race bib, metallic sunglasses) - Specific lighting directions (soft but focused, shadow contours) shape the mood. Input: A photo of me in a pool Input photo: A muscular man Output prompt: A top-down editorial photo of a muscular man falling off a bright pink inflatable pool float, mid-fall with his body twisting toward the water. He wears black swim shorts and silver Oakley sunglasses. His arms are flailing slightly, and water droplets hang frozen in the air around him, hit by harsh flash. The float is distorted by motion, and splash trails from his legs as they hit the surface. The pool is a sunlit turquoise, with subtle tile reflection and lens specks near the corners. There's bloom from the water highlights, and the entire shot has an analog, fashion-campaign feel with no visible grain. Use a Photorealistic Style. Resolution 1792x1024. Fisheye! Motion blur Why the output is good: - Unique perspective (top-down) combined with dynamic action (falling off, mid-fall, twisting, flailing). - Specifies analog, fashion-campaign feel but requests no visible grain, guiding the texture. - Adding Fisheye and Motion blur at the end reinforces these key elements. Input: A photo of me as Batman Input photo: Asian man Portrait of asian man as Batman in the style of Rembrandt black and white, chiaroscuro lighting, deep shadows, and luminous highlights. His face emerges from darkness, one eye catching a sliver of light, the other lost in shadow. The cowl is rendered like aged leather, with thick, textured brushstrokes and visible impasto. The Batsymbol is faint, almost erased, as if worn by time. Background: void of form, only grain and darkness. Style: baroque oil painting translated to monochrome — dramatic, emotional Why the output is good: - The overall style fits the theme of the Batman. HERE is the user's prompt: {{ user_prompt }} """) FASHION_PROMPT_TEMPLATE = jinja2.Template("""Generate a striking fashion editorial photograph with strong conceptual impact and visual distinction. Focus on capturing a model in a powerful pose or moment that showcases both their features and the styling elements (clothing, accessories, makeup) in a compelling context. context. Utilize bold lighting techniques (e.g., hard shadow play, colored gels, dramatic high-key or low-key setups) and innovative composition (e.g., unconventional cropping, extreme perspectives, symmetry/asymmetry) to create a distinctive mood. Incorporate environmental elements or props that enhance the narrative. The final image should balance artistic expression with commercial appeal, conveying a specific attitude, concept, or emotional tone while maintaining the fashion focus. Analysis: Fashion editorial prompts aim to create images with both artistic and commercial value. Success often comes from a precise balance of styling details, environmental context, and technical elements like lighting and composition. Strong concepts and bold visual choices typically yield the most compelling results. Describing makeup, accessories, and specific fashion elements in detail helps the AI create cohesive styling. Tips for Success: ● Specify the precise styling (clothing items, fabrics, colors, fit, accessories) ● Detail the model's features and pose (expression, positioning, gesture) ● Describe the makeup and hair with specificity (textures, colors, style) ● Define the lighting setup (direction, quality, color, shadow effects) ● Include props or environmental elements that enhance the concept ● Suggest a brand or editorial reference for stylistic guidance ● Add compositional directions (framing, cropping, perspective) Key Keywords: fashion editorial, fashion photography, studio photography, location editorial, model pose, conceptual fashion, striking composition, dramatic lighting, hard light, soft light, colored gels, editorial makeup, fashion styling, haute couture, ready-to-wear, fashion concept, art direction, commercial appeal, [brand reference] Examples: Prompt 1 (OHNEIS Runner): Output prompt: A stylized, cinematic portrait of a Black man captured from the chest up, set against a glowing deep red background. The image is tightly framed in vertical format, emphasizing his upper torso, neck, and face in moody, directional light. He wears a torn black tank top with rugged edges and a marathon race bib pinned to the front reading "69" with the word "OHNEIS" printed boldly underneath. Around his neck hangs a thin silver chain. His hair is styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, with the word "ohneis" engraved across the lens — subtly visible in the reflections. The lighting is soft but focused, casting strong shadow contours along his collarbone and highlighting the reflective elements of both glasses and sweat on his skin. The mood is intense and editorial — a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle rebellion. The torn shirt and race bib hint at exertion and context, while the engraved eyewear and red glow turn the portrait into a branded fashion statement. Sora Companion Prompt: A high-impact, vertically framed editorial portrait of a male runner against a rich red backdrop. The athlete wears a distressed black tank top with a torn collar and a large pinned-on race number reading "69" above the bold white word "OHNEIS" in a rectangular black bar. His face is partially shadowed, lit with cinematic precision that accentuates the contours of his skin and shoulders. He wears reflective blue OHNEIS sports sunglasses with the brand name clearly legible across the lens. His expression is stoic, exuding focus and control. The image balances a gritty, competitive energy with futuristic fashion elements and controlled studio lighting. ● Analysis: This prompt masterfully blends fashion and athletic elements. The detailed styling (torn tank top, race bib, metallic sunglasses) creates brand identity. Specific lighting directions (soft but focused, shadow contours) shape the mood. The Sora prompt maintains the exact styling while adding subtle emotional elements (stoic expression, focus and control) for video continuity. ● Variations & Keywords: Try different backgrounds (urban setting, abstract color field), lighting scenarios (harsh backlight, cool blue tones), model demographics, or accessories (compression sleeves, techwear). Keywords: athletic fashion, editorial portrait, directional lighting, race bib, futuristic eyewear, brand styling, reflective elements, torn fabric, cinematic portrait, color-blocking. Prompt 2 (Sprinting Runner): Output prompt: A vertical-format, side-profile flash photograph capturing a Black male runner sprinting down a sunlit urban street from an elevated angle. The camera looks slightly down at the scene, placing the runner in the center-right of the frame, mid-stride with one leg extended behind and arms pumping forward. He wears a reflective silver windbreaker, black running shorts, white socks, and sleek performance shoes. A pair of dark sunglasses adds attitude and edge to his motion. The runner is in motion blur, especially on limbs and head, with only parts of the torso and upper back lightly frozen by a directional rear-curtain sync flash. His movement arcs forward across the frame, and the reflective jacket catches intense flashes of light, bouncing subtle highlights across the scene. Below the asphalt road, a strip of green grass borders the street at the bottom edge of the image, adding a clean contrasting base to the composition. The background is dark asphalt, textured with faint painted lines and subtle shadows. The elevated camera position allows for a sense of depth and rhythm as the runner cuts across the frame from left to right, motion trailing behind. Warm natural light streaks or golden ambient flares may bleed across the top of the image for added cinematic tension. ● Analysis: This prompt creates dynamic motion through specific technical directions (rear-curtain sync flash, motion blur). The clothing details (reflective silver windbreaker) add visual interest through light interaction. The composition instructions (elevated angle, center-right placement) guide spatial understanding. Environmental elements (grass strip, asphalt texture) add realism and grounding. ● Variations & Keywords: Change the setting (beach, track, trail), lighting (twilight, rainy day), clothing (branded apparel, minimal gear), or technique (front-curtain sync, pan blur). Keywords: sports fashion, running, motion photography, rear-curtain sync, flash photography, reflective materials, urban environment, elevated angle, side profile, dynamic pose, directional movement. Prompt 3 (Track Athlete): Output prompt: A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black female track runner launching from the starting blocks at night. The image is taken from a low, side angle, tightly framed at ground level, with her silver sprinting spikes clearly visible — one foot pushing forcefully into the rear block, the other caught mid-air in dramatic motion. She wears white ankle-high performance socks, and her defined, muscular calves are frozen in the peak of exertion. The flash lighting from the front-left casts sharp highlights on her skin and the metallic texture of the shoes, while the surrounding track surface — deep blue and textured — catches scattered moisture droplets that shimmer in the light. The starting blocks behind her blur slightly, and her trailing leg dissolves into motion streaks, captured using a slow shutter speed with rear-curtain sync to enhance the sense of explosive movement. The background is minimal and moody: abstract light streaks from stadium lighting stretch diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek, raw, and cinematic — focused on power, speed, and launch precision. ● Analysis: This prompt uses extreme close-up composition to transform a sports moment into fashion art. The visual contrast between static elements (starting blocks, track surface) and dynamic elements (mid-air foot, motion streaks) creates tension. Technical specifications (flash from front-left, slow shutter, rear-curtain sync) guide the lighting and motion effects precisely. Texture details (moisture droplets, metallic shoes) add depth and realism. ● Variations & Keywords: Try different sports (swimming dive, basketball jump shot), perspectives (from front, from above), lighting (daylight, colored gels), or focusing on different body parts (hands, torso). Keywords: athletic footwear, sprinting spikes, track and field, starting blocks, flash photography, low angle, extreme close-up, rear-curtain sync, moisture droplets, motion blur, explosive movement, performance apparel. You need to enhance the following prompt according to the guide above. Only output the prompt, no other text. {{ user_prompt }} """) def process_prompt(image, target_label, user_prompt, style): image_url = None buffer = BytesIO() image.convert("RGB").save(buffer, format="JPEG", quality=90) b64_image = base64.b64encode(buffer.getvalue()).decode("utf-8") image_url = f"data:image/jpeg;base64,{b64_image}" if style == "Chromatic Cinematic": system_content = """You are an expert prompt engineer for chromatic cinematic-style image generation. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with strong contrast and aesthetic color grading such as Wes Anderson. Frame close to the camera so the subject is immediately recognizable, emphasizing dynamic and exaggerated editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subject—examples like architectural beams, diagonal staircases, waves, or shadows can inspire but do not need to be used literally. Focus heavily on lighting to sculpt the form and mood, using two lighting sources from different directions, attractive color combinations, and interesting lighting angles (e.g., dramatic diagonal or overhead from the top-left corner). When referencing a style like Wes Anderson, describe the scene, composition, or color grading (e.g., bold symmetry, saturated pastels) without simply copying his visuals. Use a photorealistic style. Resolution 1792x1024.""" user_content = ( f"Use the uploaded image to infer the subject's appearance attribtues. Instead of refercing pronouns in the prompt (i.e. me/she siting on a chair), use the attributes to describe the subjet (i.e. the man with the glasses sitting on the chair). " f"Then transform this prompt into a detailed chromatic cinematic style description: User's prompt: {user_prompt}" ) elif style == "Film Noir": system_content = "You are an expert prompt engineer for cinematic-style image generation in the film noir aesthetic. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with high contrast, deep shadows, and moody lighting characteristic of classic noir. Frame close to the camera so the subject is immediately recognizable, emphasizing tense, dramatic, or expressive editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subject—examples like rain-slicked streets, lampposts casting long shadows, Venetian blinds, or fog can inspire but do not need to be used literally. Focus heavily on lighting to sculpt form and mood, using harsh key lights, soft fill lights, and strong directional shadows to create tension and depth. When referencing a style like film noir, describe the scene, composition, or tonal contrasts (e.g., stark black-and-white contrasts, smoky atmospheres, reflective wet surfaces) without simply copying existing visuals. Use a photorealistic style. Resolution 1792x1024." user_content = ( "Use the uploaded image to infer the subject's appearance and incorporate accurate descriptors. " f"User's prompt: {user_prompt}" ) elif style == "General": system_content = "You are expert prompt engineer" user_content = GENERAL_PROMPT_TEMPLATE.render(user_prompt=user_prompt) elif style == "Fashion": system_content = "You are expert prompt engineer" user_content = FASHION_PROMPT_TEMPLATE.render(user_prompt=user_prompt) response = client.responses.create( model="gpt-5", reasoning={"effort": "low"}, input=[ { "role": "system", "content": system_content }, { "role": "user", "content": [ {"type": "input_text", "text": user_content}, {"type": "input_image", "image_url": image_url} ] } ], ) return f"{response.output_text} {target_label.strip()}" demo = gr.Interface( fn=process_prompt, inputs=[ gr.Image( label="Upload reference image", type="pil", ), gr.Textbox( label="Enter target label", placeholder="SMRA", ), gr.Textbox( label="Enter your prompt", placeholder="picture of me while sitting in a chair in the ocean", ), gr.Dropdown( choices=["General", "Fashion"], #choices=["Chromatic Cinematic", "Neon Noir", "General"], label="Style Selection", info="Choose the visual style for your enhanced prompt" ), ], outputs=gr.Textbox( label="Style Prompt", lines=20, ), ) demo.launch()