--- title: "Scriptura" short_description: "MultiAgent System for Screenplay Creation and Editing" emoji: 🎞️ colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.32.1 app_file: app.py pinned: true license: mit tag: agent-demo-track --- # Scriptura: A MultiAgent System for Screenplay Creation and Editing The explanation video is available [here](https://www.youtube.com/watch?v=I0201ruB1Uo) The screenplay used in the video as sample is available [here](https://www.studiobinder.com/blog/best-free-movie-scripts-online/) ## Introduction **Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity. At its heart: * Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system. * Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation. For media generation, Scriptura integrates: * MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples. * FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow. Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed. --- ## Agent Capabilities Scriptura provides a rich set of agents and tools to cover the full screenplay production and enrichment pipeline: - **Text Analysis & Summarization** - Automatically extracts key themes, character arcs, and plot points - Segments and summarizes scenes for rapid iteration - **Multimodal Ingestion** - Supports PDF, DOCX, ODT, TXT and image uploads - Transcribes audio files using OpenAI Whisper - **Image Generation** - On-the-fly storyboard and concept art creation via FLUX (black-forest-labs/FLUX.1-dev) - **Audio Generation** - Produces original soundtracks and SFX with MusicGen (AudioCraft spec) - Allows sample-conditioned audio generation - **Captioning & Metadata** - Auto-generates captions and descriptions for images using Gemma-3-27B-IT - **Optional Web Research** - Queries DuckDuckGo to fetch example scripts, sound samples, or contextual references --- ## Agent Flow Here’s an example flow demonstrating how you could use the agent. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/683eca9c72e8702dc425b51f/FFhfD2gCL-BjRC1eT-ELB.png) --- ## Code Overview ```bash . ├── app.py # Entry point: defines Gradio interface and routing logic ├── system_prompt.txt # System-level prompt template for the CodeAgent ├── requirements.txt # Python dependencies (Gradio, SmolAgents, OpenAI, etc.) └── README.md # Project documentation ``` * **app.py** * **Agent** class: loads Qwen3-32B model, registers all tools * **respond()**: orchestrates between Gradio inputs and CodeAgent * Decorated `@tool` functions for image download, media generation, transcription, captioning * Gradio `ChatInterface` setup with text/file support and “Enable web search” toggle * **system\_prompt.txt** * Injects the agent’s “way of thinking,” including reasoning structure and error handling * **requirements.txt** * Lists all required libraries (Gradio, SmolAgents, OpenAI, HuggingFace, PDFPlumber, etc.) --- ## Deployment & Access ### Hugging Face Spaces 1. Include `app.py`, `system_prompt.txt`, and `requirements.txt` in the root of your Space. 2. Configure `OPENAI_API_KEY` and `HF_TOKEN` as Secrets in your Space’s settings. 3. Make sure the Space is set to use **Python 3.10 or higher**. 4. Select **Gradio** as the SDK (version 5.32.1). 5. Pin or share the Space link to collaborate with your team. > **Note:** If you choose to clone this repository and run it locally, make sure to set your own `OPENAI_API_KEY` and `HF_TOKEN` environment variables before launching. --- ## Use Cases **Independent Writer** * Upload a screenplay and quickly get a summary, a list of characters, and locations. * Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs). * Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV). **Film Production Company** * Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues. * Use the web search feature to find reference scripts or specific sound effects from free/paid sources. * Develop visual storyboards and audio prototypes to share with directors, artists, and investors. **Translation and Adaptation Agency** * Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV). * Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX). * Produce reference audio via MusicGen to test culturally appropriate music for the target audience. **Digital Humanities Course** * Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines. * Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment. * Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum. --- ## Contributors: * Code development and implementation made by **luke9705**; * Ideas creation, testing and videomaking conducted by **OrianIce**; * Research and testing by **Loren1214**; * Code revisions by **DDPM**. --- ## Sources The following libraries, models, and tools power Scriptura’s agents and multimodal capabilities: - **Qwen3-32B** – primary orchestrating LLM for high-level reasoning and workflow management - **Gradio** – interactive web UI framework - **smolagents** – lightweight multi-agent orchestrator from Hugging Face - **huggingface_hub** – model & dataset management - **duckduckgo-search** – optional web research integration - **openai** – Whisper transcription, GPT-based reasoning - **anthropic** – Claude-style LLM support - **pdfplumber** – PDF text extraction - **docx2txt** – DOCX parsing - **odfpy** – ODT parsing - **pandas** – data handling - **Pillow (PIL)** – image processing - **requests** – HTTP client for external APIs - **numpy** – numerical operations - **MusicGen (AudioCraft)** – soundtrack and SFX generation - **FLUX (black-forest-labs/FLUX.1-dev)** – on-the-fly image generation - **Gemma-3-27B-IT** – multimodal captioning and metadata