---
title: "Scriptura"
short_description: "MultiAgent System for Screenplay Creation and Editing"
emoji: 🎞️
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: true
license: mit
tag: agent-demo-track
---

# Scriptura: A MultiAgent System for Screenplay Creation and Editing

The explanation video is available [here](https://www.youtube.com/watch?v=I0201ruB1Uo)

The screenplay used in the video as sample is available [here](https://www.studiobinder.com/blog/best-free-movie-scripts-online/)

## Introduction

**Scriptura** is a multi-agent AI framework based on HF-SmolAgents that streamlines the creation of screenplays, storyboards, and soundtracks by automating the stages of analysis, summarization, and multimodal enrichment—freeing authors to focus on pure creativity.

At its heart:

* Qwen3-32B serves as the primary orchestrating agent, coordinating workflows and managing high-level reasoning across the system.
* Gemma-3-27B-IT acts as a specialized assistant for multimodal tasks, supporting both text and audio inputs to refine narrative elements and prepare them for downstream generation.

For media generation, Scriptura integrates:

* MusicGen models (per the AudioCraft MusicGen specification), deployed via Hugging Face Spaces, enabling the agent to produce original soundtracks and sound effects from text prompts or combined text + audio samples.
* FLUX (black-forest-labs/FLUX.1-dev) for on-the-fly image creation, ideal for storyboards, concept art, and visual references that seamlessly tie into the narrative flow.

Optionally, Scriptura can query external sources (e.g., via a DuckDuckGo API integration) to pull in reference scripts, sound samples, or research materials, ensuring that every draft is not only creatively rich but also contextually informed.

---

## Agent Capabilities

Scriptura provides a rich set of agents and tools to cover the full screenplay production and enrichment pipeline:

- **Text Analysis & Summarization**  
  - Automatically extracts key themes, character arcs, and plot points  
  - Segments and summarizes scenes for rapid iteration  

- **Multimodal Ingestion**  
  - Supports PDF, DOCX, ODT, TXT and image uploads  
  - Transcribes audio files using OpenAI Whisper  

- **Image Generation**  
  - On-the-fly storyboard and concept art creation via FLUX (black-forest-labs/FLUX.1-dev)  

- **Audio Generation**  
  - Produces original soundtracks and SFX with MusicGen (AudioCraft spec)  
  - Allows sample-conditioned audio generation  

- **Captioning & Metadata**  
  - Auto-generates captions and descriptions for images using Gemma-3-27B-IT  

- **Optional Web Research**  
  - Queries DuckDuckGo to fetch example scripts, sound samples, or contextual references  

---

## Agent Flow

Here’s an example flow demonstrating how you could use the agent.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/683eca9c72e8702dc425b51f/FFhfD2gCL-BjRC1eT-ELB.png)

---

## Code Overview

```bash
.
├── app.py               # Entry point: defines Gradio interface and routing logic
├── system_prompt.txt    # System-level prompt template for the CodeAgent
├── requirements.txt     # Python dependencies (Gradio, SmolAgents, OpenAI, etc.)
└── README.md            # Project documentation
```

* **app.py**

  * **Agent** class: loads Qwen3-32B model, registers all tools
  * **respond()**: orchestrates between Gradio inputs and CodeAgent
  * Decorated `@tool` functions for image download, media generation, transcription, captioning
  * Gradio `ChatInterface` setup with text/file support and “Enable web search” toggle

* **system\_prompt.txt**

  * Injects the agent’s “way of thinking,” including reasoning structure and error handling

* **requirements.txt**

  * Lists all required libraries (Gradio, SmolAgents, OpenAI, HuggingFace, PDFPlumber, etc.)

---

## Deployment & Access

### Hugging Face Spaces

1. Include `app.py`, `system_prompt.txt`, and `requirements.txt` in the root of your Space.  
2. Configure `OPENAI_API_KEY` and `HF_TOKEN` as Secrets in your Space’s settings.  
3. Make sure the Space is set to use **Python 3.10 or higher**.  
4. Select **Gradio** as the SDK (version 5.32.1).  
5. Pin or share the Space link to collaborate with your team.

> **Note:** If you choose to clone this repository and run it locally, make sure to set your own `OPENAI_API_KEY` and `HF_TOKEN` environment variables before launching.

---
## Use Cases

**Independent Writer**  
* Upload a screenplay and quickly get a summary, a list of characters, and locations.  
* Create visual storyboards of key narrative moments via FLUX (PNG/JPEG outputs).  
* Generate brief soundtracks or sound effects to accompany script presentations (MP3/WAV).

**Film Production Company**  
* Import multiple screenplays (PDF, DOCX) and automatically receive reports on characters, locations, and potential copyright issues.  
* Use the web search feature to find reference scripts or specific sound effects from free/paid sources.  
* Develop visual storyboards and audio prototypes to share with directors, artists, and investors.

**Translation and Adaptation Agency**  
* Upload foreign-language scripts and obtain a structured text version with extracted entities (JSON/CSV).  
* Generate contextual images for cultural adaptation (e.g., images matching the original setting via FLUX).  
* Produce reference audio via MusicGen to test culturally appropriate music for the target audience.

**Digital Humanities Course**  
* Demonstrate how to build a text-mining tool applied to performing arts, combining NLP, image, and audio pipelines.  
* Allow students to analyze real scripts, generate abstracts, scene maps, and visual/audio prototypes in a hands-on environment.  
* Explore Transformer models (DeepSeek), OCR, speech-to-text, and AI-driven media generation as part of the curriculum.

---

## Contributors:

* Code development and implementation made by **luke9705**;
* Ideas creation, testing and videomaking conducted by **OrianIce**;
* Research and testing by **Loren1214**;
* Code revisions by **DDPM**.

---
## Sources
The following libraries, models, and tools power Scriptura’s agents and multimodal capabilities:

- **Qwen3-32B** – primary orchestrating LLM for high-level reasoning and workflow management  
- **Gradio** – interactive web UI framework  
- **smolagents** – lightweight multi-agent orchestrator from Hugging Face  
- **huggingface_hub** – model & dataset management  
- **duckduckgo-search** – optional web research integration  
- **openai** – Whisper transcription, GPT-based reasoning  
- **anthropic** – Claude-style LLM support  
- **pdfplumber** – PDF text extraction  
- **docx2txt** – DOCX parsing  
- **odfpy** – ODT parsing  
- **pandas** – data handling  
- **Pillow (PIL)** – image processing  
- **requests** – HTTP client for external APIs  
- **numpy** – numerical operations  
- **MusicGen (AudioCraft)** – soundtrack and SFX generation  
- **FLUX (black-forest-labs/FLUX.1-dev)** – on-the-fly image generation  
- **Gemma-3-27B-IT** – multimodal captioning and metadata