{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# French SSML Cascade Models Demo\n",
"\n",
"
\n",
"\n",
"**Interactive demonstration of French SSML cascade models for improved text-to-speech prosody control.**\n",
"\n",
"This notebook demonstrates the complete pipeline from plain French text to rich SSML markup with prosody control.\n",
"\n",
"## π§© Pipeline Overview\n",
"\n",
"1. **Text-to-Breaks**: Predicts natural pause locations \n",
"2. **Breaks-to-SSML**: Adds prosody control (pitch, rate, volume) and precise timing\n",
"\n",
"π **Paper**: *Improving Synthetic Speech Quality via SSML Prosody Control* (ICNLSP 2025) \n",
"π **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/ \n",
"π **Models**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) β’ [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## π Setup\n",
"\n",
"### Step 1: Mount Google Drive"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "a1jNj9uK7EoL",
"outputId": "76624289-061f-4700-e397-50da9da9ee6d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mounted at /content/drive\n"
]
}
],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive', force_remount=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: Clone Repository"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "eE3iUaX_7OLG",
"outputId": "d621b296-b12f-489a-bc1f-c7240c21646b"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
"chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
"Cloning into 'cascading_model'...\n"
]
}
],
"source": [
"%%bash\n",
"cd /content/drive/MyDrive/\n",
"git clone https://github.com/TimLukaHorstmann/cascading_model.git"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vItNbMvh7ZNL",
"outputId": "31a31144-1261-4427-9d2e-089ae17689b2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/content/drive/MyDrive/cascading_model\n"
]
}
],
"source": [
"%cd /content/drive/MyDrive/cascading_model/\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JdeuCOX_7kae",
"outputId": "f8bad5e1-92d0-4531-fbe0-ca2f29a8efd8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"breaks2ssml_inference.py\n",
"demo.py\n",
"empty_ssml_creation.py\n",
"__init__.py\n",
"pyproject.toml\n",
"README.md\n",
"requirements.txt\n",
"shared_models.py\n",
"test_models.py\n",
"text2breaks_inference.py\n"
]
}
],
"source": [
"%%bash\n",
"ls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## π§ͺ Testing & Demo\n",
"\n",
"### Step 3: Verify Installation"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "eaBx_eh-819B",
"outputId": "2c55f4fa-f17e-49b8-b032-74d670dcd34a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2025-08-06 12:36:48.453347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
"E0000 00:00:1754483808.475278 35366 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"E0000 00:00:1754483808.481612 35366 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"============================================================\n",
"π§ͺ French SSML Models - Test Suite\n",
"============================================================\n",
"π Testing imports...\n",
" β
PyTorch 2.5.1+cu121\n",
" β
Transformers 4.54.0\n",
" β
PEFT 0.16.0\n",
" β
All imports successful!\n",
"\n",
"π§ Testing model loading...\n",
" Loading text2breaks model...\n",
"Loading checkpoint shards: 100% 4/4 [01:33<00:00, 23.46s/it]\n",
" β
Text2breaks model loaded\n",
" Loading breaks2ssml model...\n",
" β
Breaks2ssml model loaded\n",
" β
All models loaded successfully!\n",
"\n",
"π§ͺ Testing inference...\n",
" Input: Bonjour comment allez-vous ?\n",
" Testing text2breaks...\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
" Step 1 result: Bonjour comment allez-vous ?\n",
" Testing breaks2ssml...\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
" Step 2 result: \n",
" Bonjour comment allez-vous ?\n",
" \n",
" \n",
" β
Inference test successful!\n",
"\n",
"π Testing full cascade...\n",
" Input: Bonsoir comment Γ§a va ?\n",
" Cascade result: \n",
" Bonsoir comment Γ§a va ?\n",
" \n",
" \n",
" β
Cascade test successful!\n",
"\n",
"============================================================\n",
"π All tests passed! The models are working correctly.\n",
"============================================================\n",
"\n",
"You can now use:\n",
" - python demo.py (for examples)\n",
" - python demo.py --interactive (for interactive mode)\n",
" - python text2breaks_inference.py --interactive\n",
" - python breaks2ssml_inference.py --interactive\n"
]
}
],
"source": [
"!python test_models.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4: Interactive Demo\n",
"\n",
"Run the interactive demo to test the models with your own French text:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZIeUY9atUhvV",
"outputId": "581f1395-fa70-424f-9c66-50b5e44547c3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2025-08-06 12:21:35.541051: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
"E0000 00:00:1754482895.561958 31169 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"E0000 00:00:1754482895.568312 31169 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"================================================================================\n",
"Interactive French SSML Cascade\n",
"================================================================================\n",
"\n",
"Choose mode:\n",
"1. Full cascade (text β breaks β SSML)\n",
"2. Text to breaks only\n",
"3. Breaks to SSML only\n",
"\n",
"Select mode (1-3): 1\n",
"\n",
"Initializing models...\n",
"Loading checkpoint shards: 100% 4/4 [01:30<00:00, 22.70s/it]\n",
"Models loaded successfully!\n",
"\n",
"Enter French text (empty line to exit):\n",
"\n",
"> Je suis Luka.\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
"Output: \n",
" Je suis Luka.\n",
" \n",
" \n",
"Time: 6.55s\n",
"\n",
"> TrΓ©s bien.\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
"Output: \n",
" TrΓ©s bien.\n",
" \n",
" \n",
"Time: 5.64s\n",
"\n",
"> Je suis Bertrand Perier. Je suis avocat et vous Γ©coutez ma masterclass.\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
"Output: \n",
" Je suis Bertrand Perier.\n",
" \n",
" \n",
"\n",
" \n",
" Je suis avocat et vous Γ©coutez ma masterclass.\n",
" \n",
" \n",
"Time: 12.11s\n",
"\n",
"> Exception ignored in: \n",
"Traceback (most recent call last):\n",
" File \"/usr/lib/python3.11/threading.py\", line 1541, in _shutdown\n",
" def _shutdown():\n",
" \n",
"KeyboardInterrupt: \n"
]
}
],
"source": [
"!python demo.py --interactive"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## π― Example Usage\n",
"\n",
"```python\n",
"from breaks2ssml_inference import CascadedInference\n",
"\n",
"# Initialize the full cascade\n",
"cascade = CascadedInference()\n",
"\n",
"# Convert plain French text to SSML\n",
"text = \"Bonjour comment allez-vous aujourd'hui ?\"\n",
"result = cascade.predict(text)\n",
"print(result)\n",
"```\n",
"\n",
"**Expected Output:**\n",
"```xml\n",
"Bonjour comment allez-vous aujourd'hui ?\n",
"```\n",
"\n",
"## π Resources\n",
"\n",
"- **Audio Demos**: https://horstmann.tech/ssml-prosody-control/\n",
"- **GitHub Repository**: https://github.com/TimLukaHorstmann/cascading_model\n",
"- **Stage 1 Model**: https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora\n",
"- **Stage 2 Model**: https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora\n",
"\n",
"---\n",
"*Hi! Paris - Interdisciplinary Research Institute for Artificial Intelligence*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}