A newer version of the Gradio SDK is available:
5.34.0
title: Scriptura
short_description: MultiAgent System for Screenplay Creation and Editing
emoji: 🎞️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: true
license: mit
tag: agent-demo-track
Scriptura: A MultiAgent System for Screenplay Creation and Editing
The explanation video is available here
The screenplay used in the video as sample is available here
Introduction
Scriptura is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.
At its heart:
- Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
- Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.
For media generation, Scriptura integrates:
- MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
- FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.
Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.
Agent Capabilities
Scriptura provides a rich set of agents and tools to cover the full screenplay production and enrichment pipeline:
Text Analysis & Summarization
- Automatically extracts key themes, character arcs, and plot points
- Segments and summarizes scenes for rapid iteration
Multimodal Ingestion
- Supports PDF, DOCX, ODT, TXT and image uploads
- Transcribes audio files using OpenAI Whisper
Image Generation
- On-the-fly storyboard and concept art creation via FLUX (black-forest-labs/FLUX.1-dev)
Audio Generation
- Produces original soundtracks and SFX with MusicGen (AudioCraft spec)
- Allows sample-conditioned audio generation
Captioning & Metadata
- Auto-generates captions and descriptions for images using Gemma-3-27B-IT
Optional Web Research
- Queries DuckDuckGo to fetch example scripts, sound samples, or contextual references
Agent Flow
Here’s an example flow demonstrating how you could use the agent.
Code Overview
.
├── app.py # Entry point: defines Gradio interface and routing logic
├── system_prompt.txt # System-level prompt template for the CodeAgent
├── requirements.txt # Python dependencies (Gradio, SmolAgents, OpenAI, etc.)
└── README.md # Project documentation
app.py
- Agent class: loads Qwen3-32B model, registers all tools
- respond(): orchestrates between Gradio inputs and CodeAgent
- Decorated
@tool
functions for image download, media generation, transcription, captioning - Gradio
ChatInterface
setup with text/file support and “Enable web search” toggle
system_prompt.txt
- Injects the agent’s “way of thinking,” including reasoning structure and error handling
requirements.txt
- Lists all required libraries (Gradio, SmolAgents, OpenAI, HuggingFace, PDFPlumber, etc.)
Deployment & Access
Hugging Face Spaces
- Include
app.py
,system_prompt.txt
, andrequirements.txt
in the root of your Space. - Configure
OPENAI_API_KEY
andHF_TOKEN
as Secrets in your Space’s settings. - Make sure the Space is set to use Python 3.10 or higher.
- Select Gradio as the SDK (version 5.32.1).
- Pin or share the Space link to collaborate with your team.
Note: If you choose to clone this repository and run it locally, make sure to set your own
OPENAI_API_KEY
andHF_TOKEN
environment variables before launching.
Use Cases
Independent Writer
- Upload a screenplay and quickly get a summary, a list of characters, and locations.
- Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).
- Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).
Film Production Company
- Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.
- Use the web search feature to find reference scripts or specific sound effects from free/paid sources.
- Develop visual storyboards and audio prototypes to share with directors, artists, and investors.
Translation and Adaptation Agency
- Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).
- Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).
- Produce reference audio via MusicGen to test culturally appropriate music for the target audience.
Digital Humanities Course
- Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.
- Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.
- Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.
Contributors:
- Code development and implementation made by luke9705;
- Ideas creation, testing and videomaking conducted by OrianIce;
- Research and testing by Loren1214;
- Code revisions by DDPM.
Sources
The following libraries, models, and tools power Scriptura’s agents and multimodal capabilities:
- Qwen3-32B – primary orchestrating LLM for high-level reasoning and workflow management
- Gradio – interactive web UI framework
- smolagents – lightweight multi-agent orchestrator from Hugging Face
- huggingface_hub – model & dataset management
- duckduckgo-search – optional web research integration
- openai – Whisper transcription, GPT-based reasoning
- anthropic – Claude-style LLM support
- pdfplumber – PDF text extraction
- docx2txt – DOCX parsing
- odfpy – ODT parsing
- pandas – data handling
- Pillow (PIL) – image processing
- requests – HTTP client for external APIs
- numpy – numerical operations
- MusicGen (AudioCraft) – soundtrack and SFX generation
- FLUX (black-forest-labs/FLUX.1-dev) – on-the-fly image generation
- Gemma-3-27B-IT – multimodal captioning and metadata