Updated model + inference + model card + colab

Files changed (9) hide show

README.md +113 -94
added_tokens.json +0 -24
chat_template.jinja +0 -54
merges.txt +0 -0
notebook.ipynb +378 -0
special_tokens_map.json +0 -31
tokenizer.json +0 -3
tokenizer_config.json +0 -207
vocab.json +0 -0

README.md CHANGED Viewed

@@ -1,153 +1,172 @@
 ---
 license: apache-2.0
 base_model: Qwen/Qwen2.5-7B
 library_name: peft
 tags:
-- text-to-speech
 - ssml
-- french
 - qwen2.5
-- lora
----
-# 🗣️ ssml-break2ssml-fr-lora
-This is the second-stage LoRA adapter for **French SSML generation**, converting *pause-annotated text* into full SSML markup with `<break>` tags.
-This model is part of the cascade described in the paper:
-**"Improving French Synthetic Speech Quality via SSML Prosody Control"**
-Nassima Ould-Ouali, Éric Moulines – *ICNLSP 2025 (Springer LNCS)* [accepted].
 ---
-## 🧠 Model Details
-- **Base model**: [`Qwen/Qwen2.5-7B`](https://huggingface.co/Qwen/Qwen2.5-7B)
-- **Adapter method**: LoRA (Low-Rank Adaptation via [`peft`](https://github.com/huggingface/peft))
-- **LoRA rank**: 8 — **Alpha**: 16
-- **Training**: 5 epochs, batch size 1 (gradient accumulation)
-- **Languages**: French
-- **Model size**: 7B (adapter-only)
-- **License**: Apache 2.0
----
 ## 🧩 Pipeline Overview
-This model is part of a two-stage SSML cascade for improving French TTS prosody:
-| Step | Model                                     | Description                                  |
-|------|-------------------------------------------|----------------------------------------------|
-| 1️⃣   | `nassimaODL/ssml-text2breaks-fr-lora`     | Inserts symbolic pauses like `#250`, `#500`  |
-| 2️⃣   | `nassimaODL/ssml-break2ssml-fr-lora`      | Converts symbols to `<break time="..."/>` SSML |
 ## ✨ Example
-```text
-Input:  Bonjour#250 comment vas-tu ?
-Output: Bonjour<break time="250ms"/> comment vas-tu ?
 ```
----
-## 🚀 How to run the code
-```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from peft import PeftModel
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
-base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B", device_map="auto")
-model = PeftModel.from_pretrained(base_model, "nassimaODL/ssml-break2ssml-fr-lora")
-input_text = "Bonjour#250 comment vas-tu ?"
-inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
-with torch.no_grad():
-    outputs = model.generate(**inputs, max_new_tokens=128)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ```
----
-## 🧪 Evaluation Summary
-| Metric                    | Value         |
-|--------------------------|---------------|
-| Pause Insertion Accuracy | 87.3%         |
-| RMSE (pause duration)    | 98.5 ms       |
-| MOS gain (vs. baseline)  | +0.42         |
-Evaluation was performed on a held-out French validation set with annotated SSML pauses. Mean Opinion Score (MOS) improvements were assessed using TTS outputs rendered with Azure Henri voice and rated by 30 native French speakers.
----
-## 📚  Training Data
-This LoRA adapter was trained on a corpus of ~4,500 French utterances. Input texts were annotated with symbolic pause indicators (e.g., `#250` for 250ms), automatically aligned using a combination of Whisper-Kyutai timestamping and F0/syntactic heuristics.
-Annotations were refined via a hybrid heuristic rule set combining:
-- Voice activity boundaries (via Auditok)
-- F0 contour analysis (pitch dips before breaks)
-- Syntactic cues (punctuation, conjunctions)
-For full details, see our data preparation pipeline on GitHub:
-🔗 [https://github.com/NassimaOULDOUALI/Prosody-Control-French-TTS](https://github.com/NassimaOULDOUALI/Prosody-Control-French-TTS)
----
-## ⚙️ Training Setup
-- **Compute**: Jean-Zay (GENCI/IDRIS), A100 80GB x1
-- **Framework**: HuggingFace `transformers` + `peft`
-- **LoRA method**: rank = 8, alpha = 16, dropout = 0.05
-- **Precision**: bf16
-- **Max sequence length**: 768 tokens (256 input + 512 output)
-- **Epochs**: 5
-- **Optimizer**: AdamW (lr = 2e-4, no warmup)
-- **LoRA target modules**:
-  `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
-Training was performed using the [Unsloth](https://github.com/unslothai/unsloth) SFTTrainer and PEFT adapter injection on Qwen2.5-7B base.
----
-## ⚠️  Limitations
-- Only `<break>` tags are supported; no pitch, rate, or emphasis control yet.
-- Pause accuracy is sensitive to punctuation and malformed inputs.
-- SSML output has been optimized primarily for Azure voices (e.g., `fr-FR-HenriNeural`). Other engines may interpret `<break>` tags differently.
-- The model assumes the presence of symbolic pause markers in the input (e.g., `#250`). For automatic prediction of such symbols, refer to our stage-1 model:
-  🔗 [`nassimaODL/ssml-text2breaks-fr-lora`](https://huggingface.co/nassimaODL/ssml-text2breaks-fr-lora)
----
 ## 📖 Citation
 @inproceedings{ould-ouali2025_improving,
   title     = {Improving Synthetic Speech Quality via SSML Prosody Control},
   author    = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
-  booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)}, % TODO: vérifier l'intitulé exact utilisé par la conf
   year      = {2025},
-  pages     = {XX--YY},   % TODO
-  publisher = {—},        % TODO
-  address   = {—}         % TODO
 }

 ---
 license: apache-2.0
 base_model: Qwen/Qwen2.5-7B
 library_name: peft
+language:
+- fr
 tags:
+- lora
+- peft
 - ssml
+- text-to-speech
 - qwen2.5
+pipeline_tag: text-generation
 ---
+# 🗣️ French Breaks-to-SSML LoRA Model
+**hi-paris/ssml-breaks2ssml-fr-lora** is a LoRA adapter fine-tuned on Qwen2.5-7B to convert text with symbolic `<break/>` markers into rich SSML markup with prosody control (pitch, rate, volume) and precise break timing.
+This is the **second stage** of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.
+> 📄 **Paper**: *"Improving Synthetic Speech Quality via SSML Prosody Control"*
+> **Authors**: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines
+> **Conference**: ICNLSP 2025
+> 🔗 **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/
 ## 🧩 Pipeline Overview
+| Stage | Model | Purpose |
+|-------|-------|---------|
+| 1️⃣ | [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) | Predicts natural pause locations |
+| 2️⃣ | **hi-paris/ssml-breaks2ssml-fr-lora** | Converts breaks to full SSML with prosody |
 ## ✨ Example
+**Input:**
+```
+Bonjour comment allez-vous ?<break/>
+```
+**Output:**
+```
+<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous ?</prosody><break time="300ms"/>
 ```
+## 🚀 Quick Start
+### Installation
+```bash
+pip install torch transformers peft accelerate
+```
+### Basic Usage
+```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from peft import PeftModel
+import torch
+# Load base model and tokenizer
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-7B",
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
 tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
+# Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-breaks2ssml-fr-lora")
+# Prepare input (text with <break/> markers)
+text_with_breaks = "Bonjour comment allez-vous ?<break/>"
+formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text_with_breaks}\n\n### SSML:\n"
+# Generate
+inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=128,
+        temperature=0.3,
+        do_sample=False,
+        pad_token_id=tokenizer.eos_token_id
+    )
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+result = response.split("### SSML:\n")[-1].strip()
+print(result)
 ```
+### Production Usage (Recommended)
+For production use with memory optimization, see our [inference repository](https://github.com/TimLukaHorstmann/cascading_model):
+```python
+from breaks2ssml_inference import Breaks2SSMLInference
+# Memory-efficient shared model approach
+model = Breaks2SSMLInference()
+result = model.predict("Bonjour comment allez-vous ?<break/>")
+```
+## 🔧 Full Cascade Example
+```python
+from breaks2ssml_inference import CascadedInference
+# Initialize full pipeline (memory efficient - single base model)
+cascade = CascadedInference()
+# Convert plain text directly to full SSML
+text = "Bonjour comment allez-vous aujourd'hui ?"
+ssml_output = cascade.predict(text)
+print(ssml_output)
+# Output: '<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>'
+```
+## 🧠 Model Details
+- **Base Model**: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
+- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
+- **LoRA Rank**: 8, Alpha: 16
+- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
+- **Training**: 5 epochs, batch size 1 with gradient accumulation
+- **Language**: French
+- **Model Size**: 7B parameters (LoRA adapter: ~81MB)
+- **License**: Apache 2.0
+## 📊 Performance
+| Metric | Score |
+|--------|-------|
+| Pause Insertion Accuracy | 87.3% |
+| RMSE (pause duration) | 98.5 ms |
+| MOS gain (vs. baseline) | +0.42 |
+*Evaluation performed on held-out French validation set with annotated SSML pauses. Mean Opinion Score (MOS) improvements assessed using TTS outputs with Azure Henri voice, rated by 30 native French speakers.*
+## 🎯 SSML Features Generated
+- **Prosody Control**: Dynamic pitch, rate, and volume adjustments
+- **Break Timing**: Precise pause durations (e.g., `<break time="300ms"/>`)
+- **Contextual Adaptation**: Prosody values adapted to semantic content
+## ⚠️ Limitations
+- Optimized primarily for Azure TTS voices (e.g., `fr-FR-HenriNeural`)
+- Requires input text with `<break/>` markers (use Stage 1 model for automatic prediction)
+- Currently supports break tags only (pitch/rate/volume via prosody wrapper)
+## 🔗 Resources
+- **Full Pipeline Code**: https://github.com/TimLukaHorstmann/cascading_model
+- **Interactive Demo**: [Colab Notebook](https://colab.research.google.com/drive/1bFcbJQY9OuY0_zlscqkf9PIgd3dUrIKs?usp=sharing)
+- **Stage 1 Model**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora)
 ## 📖 Citation
+```bibtex
 @inproceedings{ould-ouali2025_improving,
   title     = {Improving Synthetic Speech Quality via SSML Prosody Control},
   author    = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
+  booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
   year      = {2025},
+  url       = {https://huggingface.co/hi-paris}
 }
+```
+## 📜 License
+Apache 2.0 License (same as the base Qwen2.5-7B model)

added_tokens.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "</tool_call>": 151658,
-  "<tool_call>": 151657,
-  "<|box_end|>": 151649,
-  "<|box_start|>": 151648,
-  "<|endoftext|>": 151643,
-  "<|file_sep|>": 151664,
-  "<|fim_middle|>": 151660,
-  "<|fim_pad|>": 151662,
-  "<|fim_prefix|>": 151659,
-  "<|fim_suffix|>": 151661,
-  "<|im_end|>": 151645,
-  "<|im_start|>": 151644,
-  "<|image_pad|>": 151655,
-  "<|object_ref_end|>": 151647,
-  "<|object_ref_start|>": 151646,
-  "<|quad_end|>": 151651,
-  "<|quad_start|>": 151650,
-  "<|repo_name|>": 151663,
-  "<|video_pad|>": 151656,
-  "<|vision_end|>": 151653,
-  "<|vision_pad|>": 151654,
-  "<|vision_start|>": 151652
-}

chat_template.jinja DELETED Viewed

@@ -1,54 +0,0 @@
-{%- if tools %}
-    {{- '<|im_start|>system\n' }}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- messages[0]['content'] }}
-    {%- else %}
-        {{- 'You are a helpful assistant.' }}
-    {%- endif %}
-    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
-    {%- for tool in tools %}
-        {{- "\n" }}
-        {{- tool | tojson }}
-    {%- endfor %}
-    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
-{%- else %}
-    {%- if messages[0]['role'] == 'system' %}
-        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
-    {%- else %}
-        {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
-    {%- endif %}
-{%- endif %}
-{%- for message in messages %}
-    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
-        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
-    {%- elif message.role == "assistant" %}
-        {{- '<|im_start|>' + message.role }}
-        {%- if message.content %}
-            {{- '\n' + message.content }}
-        {%- endif %}
-        {%- for tool_call in message.tool_calls %}
-            {%- if tool_call.function is defined %}
-                {%- set tool_call = tool_call.function %}
-            {%- endif %}
-            {{- '\n<tool_call>\n{"name": "' }}
-            {{- tool_call.name }}
-            {{- '", "arguments": ' }}
-            {{- tool_call.arguments | tojson }}
-            {{- '}\n</tool_call>' }}
-        {%- endfor %}
-        {{- '<|im_end|>\n' }}
-    {%- elif message.role == "tool" %}
-        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
-            {{- '<|im_start|>user' }}
-        {%- endif %}
-        {{- '\n<tool_response>\n' }}
-        {{- message.content }}
-        {{- '\n</tool_response>' }}
-        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
-            {{- '<|im_end|>\n' }}
-        {%- endif %}
-    {%- endif %}
-{%- endfor %}
-{%- if add_generation_prompt %}
-    {{- '<|im_start|>assistant\n' }}
-{%- endif %}

merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

notebook.ipynb ADDED Viewed

	@@ -0,0 +1,378 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# French SSML Cascade Models Demo\n",
+        "\n",
+        "<img src=\"https://www.hi-paris.fr/wp-content/uploads/2020/09/logo-hi-paris-retina.png\" alt=\"Hi! Paris\" width=\"200\"/>\n",
+        "\n",
+        "**Interactive demonstration of French SSML cascade models for improved text-to-speech prosody control.**\n",
+        "\n",
+        "This notebook demonstrates the complete pipeline from plain French text to rich SSML markup with prosody control.\n",
+        "\n",
+        "## 🧩 Pipeline Overview\n",
+        "\n",
+        "1. **Text-to-Breaks**: Predicts natural pause locations  \n",
+        "2. **Breaks-to-SSML**: Adds prosody control (pitch, rate, volume) and precise timing\n",
+        "\n",
+        "📄 **Paper**: *Improving Synthetic Speech Quality via SSML Prosody Control* (ICNLSP 2025)  \n",
+        "🔗 **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/  \n",
+        "📚 **Models**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) • [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)\n",
+        "\n",
+        "---"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 🚀 Setup\n",
+        "\n",
+        "### Step 1: Mount Google Drive"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 34,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "a1jNj9uK7EoL",
+        "outputId": "76624289-061f-4700-e397-50da9da9ee6d"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Mounted at /content/drive\n"
+          ]
+        }
+      ],
+      "source": [
+        "from google.colab import drive\n",
+        "drive.mount('/content/drive', force_remount=True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Step 2: Clone Repository"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 35,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "eE3iUaX_7OLG",
+        "outputId": "d621b296-b12f-489a-bc1f-c7240c21646b"
+      },
+      "outputs": [
+        {
+          "name": "stderr",
+          "output_type": "stream",
+          "text": [
+            "shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
+            "chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
+            "Cloning into 'cascading_model'...\n"
+          ]
+        }
+      ],
+      "source": [
+        "%%bash\n",
+        "cd /content/drive/MyDrive/\n",
+        "git clone https://github.com/TimLukaHorstmann/cascading_model.git"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 36,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "vItNbMvh7ZNL",
+        "outputId": "31a31144-1261-4427-9d2e-089ae17689b2"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "/content/drive/MyDrive/cascading_model\n"
+          ]
+        }
+      ],
+      "source": [
+        "%cd /content/drive/MyDrive/cascading_model/\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 37,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "JdeuCOX_7kae",
+        "outputId": "f8bad5e1-92d0-4531-fbe0-ca2f29a8efd8"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "breaks2ssml_inference.py\n",
+            "demo.py\n",
+            "empty_ssml_creation.py\n",
+            "__init__.py\n",
+            "pyproject.toml\n",
+            "README.md\n",
+            "requirements.txt\n",
+            "shared_models.py\n",
+            "test_models.py\n",
+            "text2breaks_inference.py\n"
+          ]
+        }
+      ],
+      "source": [
+        "%%bash\n",
+        "ls"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 🧪 Testing & Demo\n",
+        "\n",
+        "### Step 3: Verify Installation"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 38,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "eaBx_eh-819B",
+        "outputId": "2c55f4fa-f17e-49b8-b032-74d670dcd34a"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "2025-08-06 12:36:48.453347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
+            "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
+            "E0000 00:00:1754483808.475278   35366 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
+            "E0000 00:00:1754483808.481612   35366 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
+            "============================================================\n",
+            "🧪 French SSML Models - Test Suite\n",
+            "============================================================\n",
+            "🔍 Testing imports...\n",
+            "   ✅ PyTorch 2.5.1+cu121\n",
+            "   ✅ Transformers 4.54.0\n",
+            "   ✅ PEFT 0.16.0\n",
+            "   ✅ All imports successful!\n",
+            "\n",
+            "🔧 Testing model loading...\n",
+            "   Loading text2breaks model...\n",
+            "Loading checkpoint shards: 100% 4/4 [01:33<00:00, 23.46s/it]\n",
+            "   ✅ Text2breaks model loaded\n",
+            "   Loading breaks2ssml model...\n",
+            "   ✅ Breaks2ssml model loaded\n",
+            "   ✅ All models loaded successfully!\n",
+            "\n",
+            "🧪 Testing inference...\n",
+            "   Input: Bonjour comment allez-vous ?\n",
+            "   Testing text2breaks...\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "   Step 1 result: Bonjour comment allez-vous ?<break/>\n",
+            "   Testing breaks2ssml...\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "   Step 2 result: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
+            "    Bonjour comment allez-vous ?\n",
+            "  </prosody>\n",
+            "  <break time=\"500ms\"/>\n",
+            "   ✅ Inference test successful!\n",
+            "\n",
+            "🔗 Testing full cascade...\n",
+            "   Input: Bonsoir comment ça va ?\n",
+            "   Cascade result: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
+            "    Bonsoir comment ça va ?\n",
+            "  </prosody>\n",
+            "  <break time=\"500ms\"/>\n",
+            "   ✅ Cascade test successful!\n",
+            "\n",
+            "============================================================\n",
+            "🎉 All tests passed! The models are working correctly.\n",
+            "============================================================\n",
+            "\n",
+            "You can now use:\n",
+            "  - python demo.py (for examples)\n",
+            "  - python demo.py --interactive (for interactive mode)\n",
+            "  - python text2breaks_inference.py --interactive\n",
+            "  - python breaks2ssml_inference.py --interactive\n"
+          ]
+        }
+      ],
+      "source": [
+        "!python test_models.py"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Step 4: Interactive Demo\n",
+        "\n",
+        "Run the interactive demo to test the models with your own French text:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 29,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "ZIeUY9atUhvV",
+        "outputId": "581f1395-fa70-424f-9c66-50b5e44547c3"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "2025-08-06 12:21:35.541051: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
+            "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
+            "E0000 00:00:1754482895.561958   31169 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
+            "E0000 00:00:1754482895.568312   31169 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
+            "================================================================================\n",
+            "Interactive French SSML Cascade\n",
+            "================================================================================\n",
+            "\n",
+            "Choose mode:\n",
+            "1. Full cascade (text → breaks → SSML)\n",
+            "2. Text to breaks only\n",
+            "3. Breaks to SSML only\n",
+            "\n",
+            "Select mode (1-3): 1\n",
+            "\n",
+            "Initializing models...\n",
+            "Loading checkpoint shards: 100% 4/4 [01:30<00:00, 22.70s/it]\n",
+            "Models loaded successfully!\n",
+            "\n",
+            "Enter French text (empty line to exit):\n",
+            "\n",
+            "> Je suis Luka.\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
+            "    Je suis Luka.\n",
+            "  </prosody>\n",
+            "  <break time=\"500ms\"/>\n",
+            "Time: 6.55s\n",
+            "\n",
+            "> Trés bien.\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
+            "    Trés bien.\n",
+            "  </prosody>\n",
+            "  <break time=\"500ms\"/>\n",
+            "Time: 5.64s\n",
+            "\n",
+            "> Je suis Bertrand Perier. Je suis avocat et vous écoutez ma masterclass.\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
+            "Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
+            "    Je suis Bertrand Perier.\n",
+            "  </prosody>\n",
+            "  <break time=\"500ms\"/>\n",
+            "\n",
+            "  <prosody pitch=\"+3.78%\" rate=\"-1.29%\" volume=\"-10.00%\">\n",
+            "    Je suis avocat et vous écoutez ma masterclass.\n",
+            "  </prosody>\n",
+            "  <break time=\"500ms\"/>\n",
+            "Time: 12.11s\n",
+            "\n",
+            "> Exception ignored in: <module 'threading' from '/usr/lib/python3.11/threading.py'>\n",
+            "Traceback (most recent call last):\n",
+            "  File \"/usr/lib/python3.11/threading.py\", line 1541, in _shutdown\n",
+            "    def _shutdown():\n",
+            "    \n",
+            "KeyboardInterrupt: \n"
+          ]
+        }
+      ],
+      "source": [
+        "!python demo.py --interactive"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 🎯 Example Usage\n",
+        "\n",
+        "```python\n",
+        "from breaks2ssml_inference import CascadedInference\n",
+        "\n",
+        "# Initialize the full cascade\n",
+        "cascade = CascadedInference()\n",
+        "\n",
+        "# Convert plain French text to SSML\n",
+        "text = \"Bonjour comment allez-vous aujourd'hui ?\"\n",
+        "result = cascade.predict(text)\n",
+        "print(result)\n",
+        "```\n",
+        "\n",
+        "**Expected Output:**\n",
+        "```xml\n",
+        "<prosody pitch=\"+2.5%\" rate=\"-1.2%\" volume=\"-5.0%\">Bonjour comment allez-vous aujourd'hui ?</prosody><break time=\"300ms\"/>\n",
+        "```\n",
+        "\n",
+        "## 📚 Resources\n",
+        "\n",
+        "- **Audio Demos**: https://horstmann.tech/ssml-prosody-control/\n",
+        "- **GitHub Repository**: https://github.com/TimLukaHorstmann/cascading_model\n",
+        "- **Stage 1 Model**: https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora\n",
+        "- **Stage 2 Model**: https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora\n",
+        "\n",
+        "---\n",
+        "*Hi! Paris - Interdisciplinary Research Institute for Artificial Intelligence*"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": []
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "gpuType": "T4",
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}

special_tokens_map.json DELETED Viewed

@@ -1,31 +0,0 @@
-{
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "eos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
-}

tokenizer.json DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
-size 11421896

tokenizer_config.json DELETED Viewed

@@ -1,207 +0,0 @@
-{
-  "add_bos_token": false,
-  "add_prefix_space": false,
-  "added_tokens_decoder": {
-    "151643": {
-      "content": "<|endoftext|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151644": {
-      "content": "<|im_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151645": {
-      "content": "<|im_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151646": {
-      "content": "<|object_ref_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151647": {
-      "content": "<|object_ref_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151648": {
-      "content": "<|box_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151649": {
-      "content": "<|box_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151650": {
-      "content": "<|quad_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151651": {
-      "content": "<|quad_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151652": {
-      "content": "<|vision_start|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151653": {
-      "content": "<|vision_end|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151654": {
-      "content": "<|vision_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151655": {
-      "content": "<|image_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151656": {
-      "content": "<|video_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": true
-    },
-    "151657": {
-      "content": "<tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151658": {
-      "content": "</tool_call>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151659": {
-      "content": "<|fim_prefix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151660": {
-      "content": "<|fim_middle|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151661": {
-      "content": "<|fim_suffix|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151662": {
-      "content": "<|fim_pad|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151663": {
-      "content": "<|repo_name|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    },
-    "151664": {
-      "content": "<|file_sep|>",
-      "lstrip": false,
-      "normalized": false,
-      "rstrip": false,
-      "single_word": false,
-      "special": false
-    }
-  },
-  "additional_special_tokens": [
-    "<|im_start|>",
-    "<|im_end|>",
-    "<|object_ref_start|>",
-    "<|object_ref_end|>",
-    "<|box_start|>",
-    "<|box_end|>",
-    "<|quad_start|>",
-    "<|quad_end|>",
-    "<|vision_start|>",
-    "<|vision_end|>",
-    "<|vision_pad|>",
-    "<|image_pad|>",
-    "<|video_pad|>"
-  ],
-  "bos_token": null,
-  "clean_up_tokenization_spaces": false,
-  "eos_token": "<|endoftext|>",
-  "errors": "replace",
-  "extra_special_tokens": {},
-  "model_max_length": 131072,
-  "pad_token": "<|endoftext|>",
-  "split_special_tokens": false,
-  "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}

vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff