{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# French SSML Cascade Models Demo\n", "\n", "\"Hi!\n", "\n", "**Interactive demonstration of French SSML cascade models for improved text-to-speech prosody control.**\n", "\n", "This notebook demonstrates the complete pipeline from plain French text to rich SSML markup with prosody control.\n", "\n", "## 🧩 Pipeline Overview\n", "\n", "1. **Text-to-Breaks**: Predicts natural pause locations \n", "2. **Breaks-to-SSML**: Adds prosody control (pitch, rate, volume) and precise timing\n", "\n", "πŸ“„ **Paper**: *Improving Synthetic Speech Quality via SSML Prosody Control* (ICNLSP 2025) \n", "πŸ”— **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/ \n", "πŸ“š **Models**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) β€’ [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## πŸš€ Setup\n", "\n", "### Step 1: Mount Google Drive" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "a1jNj9uK7EoL", "outputId": "76624289-061f-4700-e397-50da9da9ee6d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mounted at /content/drive\n" ] } ], "source": [ "from google.colab import drive\n", "drive.mount('/content/drive', force_remount=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Clone Repository" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eE3iUaX_7OLG", "outputId": "d621b296-b12f-489a-bc1f-c7240c21646b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n", "chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n", "Cloning into 'cascading_model'...\n" ] } ], "source": [ "%%bash\n", "cd /content/drive/MyDrive/\n", "git clone https://github.com/TimLukaHorstmann/cascading_model.git" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vItNbMvh7ZNL", "outputId": "31a31144-1261-4427-9d2e-089ae17689b2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/content/drive/MyDrive/cascading_model\n" ] } ], "source": [ "%cd /content/drive/MyDrive/cascading_model/\n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JdeuCOX_7kae", "outputId": "f8bad5e1-92d0-4531-fbe0-ca2f29a8efd8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "breaks2ssml_inference.py\n", "demo.py\n", "empty_ssml_creation.py\n", "__init__.py\n", "pyproject.toml\n", "README.md\n", "requirements.txt\n", "shared_models.py\n", "test_models.py\n", "text2breaks_inference.py\n" ] } ], "source": [ "%%bash\n", "ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## πŸ§ͺ Testing & Demo\n", "\n", "### Step 3: Verify Installation" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eaBx_eh-819B", "outputId": "2c55f4fa-f17e-49b8-b032-74d670dcd34a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2025-08-06 12:36:48.453347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", "E0000 00:00:1754483808.475278 35366 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "E0000 00:00:1754483808.481612 35366 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "============================================================\n", "πŸ§ͺ French SSML Models - Test Suite\n", "============================================================\n", "πŸ” Testing imports...\n", " βœ… PyTorch 2.5.1+cu121\n", " βœ… Transformers 4.54.0\n", " βœ… PEFT 0.16.0\n", " βœ… All imports successful!\n", "\n", "πŸ”§ Testing model loading...\n", " Loading text2breaks model...\n", "Loading checkpoint shards: 100% 4/4 [01:33<00:00, 23.46s/it]\n", " βœ… Text2breaks model loaded\n", " Loading breaks2ssml model...\n", " βœ… Breaks2ssml model loaded\n", " βœ… All models loaded successfully!\n", "\n", "πŸ§ͺ Testing inference...\n", " Input: Bonjour comment allez-vous ?\n", " Testing text2breaks...\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", " Step 1 result: Bonjour comment allez-vous ?\n", " Testing breaks2ssml...\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", " Step 2 result: \n", " Bonjour comment allez-vous ?\n", " \n", " \n", " βœ… Inference test successful!\n", "\n", "πŸ”— Testing full cascade...\n", " Input: Bonsoir comment Γ§a va ?\n", " Cascade result: \n", " Bonsoir comment Γ§a va ?\n", " \n", " \n", " βœ… Cascade test successful!\n", "\n", "============================================================\n", "πŸŽ‰ All tests passed! The models are working correctly.\n", "============================================================\n", "\n", "You can now use:\n", " - python demo.py (for examples)\n", " - python demo.py --interactive (for interactive mode)\n", " - python text2breaks_inference.py --interactive\n", " - python breaks2ssml_inference.py --interactive\n" ] } ], "source": [ "!python test_models.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Interactive Demo\n", "\n", "Run the interactive demo to test the models with your own French text:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZIeUY9atUhvV", "outputId": "581f1395-fa70-424f-9c66-50b5e44547c3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2025-08-06 12:21:35.541051: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", "E0000 00:00:1754482895.561958 31169 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "E0000 00:00:1754482895.568312 31169 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "================================================================================\n", "Interactive French SSML Cascade\n", "================================================================================\n", "\n", "Choose mode:\n", "1. Full cascade (text β†’ breaks β†’ SSML)\n", "2. Text to breaks only\n", "3. Breaks to SSML only\n", "\n", "Select mode (1-3): 1\n", "\n", "Initializing models...\n", "Loading checkpoint shards: 100% 4/4 [01:30<00:00, 22.70s/it]\n", "Models loaded successfully!\n", "\n", "Enter French text (empty line to exit):\n", "\n", "> Je suis Luka.\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", "Output: \n", " Je suis Luka.\n", " \n", " \n", "Time: 6.55s\n", "\n", "> TrΓ©s bien.\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", "Output: \n", " TrΓ©s bien.\n", " \n", " \n", "Time: 5.64s\n", "\n", "> Je suis Bertrand Perier. Je suis avocat et vous Γ©coutez ma masterclass.\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n", "Output: \n", " Je suis Bertrand Perier.\n", " \n", " \n", "\n", " \n", " Je suis avocat et vous Γ©coutez ma masterclass.\n", " \n", " \n", "Time: 12.11s\n", "\n", "> Exception ignored in: \n", "Traceback (most recent call last):\n", " File \"/usr/lib/python3.11/threading.py\", line 1541, in _shutdown\n", " def _shutdown():\n", " \n", "KeyboardInterrupt: \n" ] } ], "source": [ "!python demo.py --interactive" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 🎯 Example Usage\n", "\n", "```python\n", "from breaks2ssml_inference import CascadedInference\n", "\n", "# Initialize the full cascade\n", "cascade = CascadedInference()\n", "\n", "# Convert plain French text to SSML\n", "text = \"Bonjour comment allez-vous aujourd'hui ?\"\n", "result = cascade.predict(text)\n", "print(result)\n", "```\n", "\n", "**Expected Output:**\n", "```xml\n", "Bonjour comment allez-vous aujourd'hui ?\n", "```\n", "\n", "## πŸ“š Resources\n", "\n", "- **Audio Demos**: https://horstmann.tech/ssml-prosody-control/\n", "- **GitHub Repository**: https://github.com/TimLukaHorstmann/cascading_model\n", "- **Stage 1 Model**: https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora\n", "- **Stage 2 Model**: https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora\n", "\n", "---\n", "*Hi! Paris - Interdisciplinary Research Institute for Artificial Intelligence*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }