Revanth-ml commited on
Commit
60a8cf4
Β·
verified Β·
1 Parent(s): 2e904c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -22,7 +22,7 @@ short_description: 'Turn any document into Interactive presentation '
22
  ---
23
  ### πŸš€ Watch the Demo!
24
 
25
- * **πŸŽ₯ Watch the Video Walkthrough on YouTube:** [[Link to your YouTube Video](https://www.youtube.com/watch?v=qBfyChj9_Q0&ab_channel=Revanth)]
26
  ---
27
  This project is a submission for the **Hugging Face & Gradio Agents & MCP Hackathon**. It demonstrates a powerful, multi-tool agentic pipeline that handles everything from creative direction to asset generation.
28
  ---
@@ -64,11 +64,17 @@ This project was built within the tight timeframe of a hackathon. Here are a few
64
 
65
  ### 2. Presentation Generation Speed
66
 
67
- * **The Issue:** The final step, where the agent builds the HTML code, can take some time (2-3 minutes).
68
  * **The Cause:** This step deliberately uses a large, powerful reasoning model (**DeepSeek-R1-0528** via Nebius) to act as an expert front-end developer. This model's strength is its high-quality, complex code generation, which comes at the cost of higher latency. This was a conscious trade-off to prioritize the *quality* of the final presentation over raw speed.
69
  * **The Workaround:** Be patient and watch the logs! The UI provides real-time feedback so you know the agent is hard at work "thinking" and "coding" your presentation.
70
  * **Roadmap:** The ideal enhancement would be to **stream the model's output**. Instead of waiting for the full HTML file, the code would appear in the "Raw HTML Code" tab token-by-token, creating an amazing "live coding" effect. This would dramatically improve the perceived performance and user experience.
71
 
 
 
 
 
 
 
72
  ---
73
 
74
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
22
  ---
23
  ### πŸš€ Watch the Demo!
24
 
25
+ * **πŸŽ₯ Watch the Video Walkthrough on YouTube:** [[Link to the YouTube Video](https://www.youtube.com/watch?v=qBfyChj9_Q0&ab_channel=Revanth)]
26
  ---
27
  This project is a submission for the **Hugging Face & Gradio Agents & MCP Hackathon**. It demonstrates a powerful, multi-tool agentic pipeline that handles everything from creative direction to asset generation.
28
  ---
 
64
 
65
  ### 2. Presentation Generation Speed
66
 
67
+ * **The Issue:** The final step, where the agent builds the HTML code, can take some time (4-6 minutes).
68
  * **The Cause:** This step deliberately uses a large, powerful reasoning model (**DeepSeek-R1-0528** via Nebius) to act as an expert front-end developer. This model's strength is its high-quality, complex code generation, which comes at the cost of higher latency. This was a conscious trade-off to prioritize the *quality* of the final presentation over raw speed.
69
  * **The Workaround:** Be patient and watch the logs! The UI provides real-time feedback so you know the agent is hard at work "thinking" and "coding" your presentation.
70
  * **Roadmap:** The ideal enhancement would be to **stream the model's output**. Instead of waiting for the full HTML file, the code would appear in the "Raw HTML Code" tab token-by-token, creating an amazing "live coding" effect. This would dramatically improve the perceived performance and user experience.
71
 
72
+ ### 3. Audio Generation Latency
73
+
74
+ * **The Issue:** Generating the audio narration for each slide can feel slow, with each individual audio file taking around 15 seconds to create.
75
+ * **The Cause:** The audio is generated by a high-quality Text-to-Speech (TTS) model deployed on a **CPU instance** on Modal. For cost-efficiency and broad accessibility during the hackathon, this CPU-based approach was chosen. While reliable, CPU-based inference for speech synthesis is significantly slower than its GPU-accelerated counterpart.
76
+ * **The Workaround:** The UI is designed to be fully asynchronous. You don't have to wait for all audio to finish before interacting with the rest of the generated presentation. The audio players for each slide will appear in the "Speaker Notes Audio" tab as soon as they are ready.
77
+ * **Roadmap:** The path to near-instant audio generation involves migrating the TTS model to a GPU-based environment. By leveraging a **Hugging Face Space with a GPU upgrade** (like a T4 or A10G) or a dedicated GPU endpoint on Modal, the inference time per slide could be reduced from ~15 seconds to just **1-2 seconds**.
78
  ---
79
 
80
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference