File size: 4,880 Bytes
0b6cc78
 
 
 
5895355
 
0b6cc78
 
 
 
5895355
 
0b6cc78
 
 
5895355
9df7e0a
5895355
9df7e0a
5895355
1b839a9
5895355
 
 
 
1b839a9
 
 
5895355
 
 
 
1b839a9
 
 
 
 
5895355
1b839a9
 
5895355
 
 
 
1b839a9
5895355
1b839a9
5895355
1b839a9
5895355
 
 
0b6cc78
5895355
0b6cc78
 
 
 
5895355
0b6cc78
 
 
 
5895355
0b6cc78
 
 
 
 
5895355
0b6cc78
 
5895355
 
0b6cc78
 
 
 
 
 
 
 
 
 
 
 
 
5895355
 
 
 
 
 
 
 
 
 
 
 
 
 
0b6cc78
 
5895355
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7533707
 
 
5895355
7533707
5895355
7533707
687f066
 
5895355
687f066
5895355
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: apache-2.0
base_model: Qwen/Qwen2.5-7B
library_name: peft
language:
- fr
tags:
- text-to-speech
- lora
- peft
- ssml
- qwen2.5
pipeline_tag: text-generation
---

# πŸ—£οΈ French Text-to-Breaks LoRA Model

**hi-paris/ssml-text2breaks-fr-lora** is a LoRA adapter fine-tuned on Qwen2.5-7B to predict natural pause locations in French text by adding symbolic `<break/>` markers.

This is the **first stage** of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.

> πŸ“„ **Paper**: *"Improving Synthetic Speech Quality via SSML Prosody Control"*  
> **Authors**: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines  
> **Conference**: ICNLSP 2025  
> πŸ”— **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/

## 🧩 Pipeline Overview

| Stage | Model | Purpose |
|-------|-------|---------|
| 1️⃣ | **hi-paris/ssml-text2breaks-fr-lora** | Predicts natural pause locations |
| 2️⃣ | [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora) | Converts breaks to full SSML with prosody |

## ✨ Example

**Input:**
```
Bonjour comment allez-vous aujourd'hui ?
```

**Output:**
```
Bonjour comment allez-vous aujourd'hui ?<break/>
```

## πŸš€ Quick Start

### Installation

```bash
pip install torch transformers peft accelerate
```

### Basic Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-text2breaks-fr-lora")

# Prepare input
text = "Bonjour comment allez-vous aujourd'hui ?"
formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text}\n\n### SSML:\n"

# Generate
inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
result = response.split("### SSML:\n")[-1].strip()
print(result)  # "Bonjour comment allez-vous aujourd'hui ?<break/>"
```

### Production Usage (Recommended)

For production use with memory optimization and full cascade, see our [inference repository](https://github.com/TimLukaHorstmann/cascading_model):

```python
from text2breaks_inference import Text2BreaksInference

# Memory-efficient shared model approach
model = Text2BreaksInference()
result = model.predict("Bonjour comment allez-vous aujourd'hui ?")
```

## πŸ”§ Full Cascade Example

```python
from breaks2ssml_inference import CascadedInference

# Initialize full pipeline (memory efficient)
cascade = CascadedInference()

# Convert plain text directly to full SSML
text = "Bonjour comment allez-vous aujourd'hui ?"
ssml_output = cascade.predict(text)
print(ssml_output)  
# Output: '<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>'
```

## 🧠 Model Details

- **Base Model**: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **LoRA Rank**: 8, Alpha: 16
- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
- **Training**: 5 epochs, batch size 1 with gradient accumulation
- **Language**: French
- **Model Size**: 7B parameters (LoRA adapter: ~81MB)
- **License**: Apache 2.0

## πŸ“Š Performance

The model achieves high accuracy in predicting natural pause locations in French text, contributing to improved prosody in text-to-speech synthesis when combined with the second-stage model.

## πŸ”— Resources

- **Full Pipeline Code**: https://github.com/TimLukaHorstmann/cascading_model
- **Interactive Demo**: [Colab Notebook](https://colab.research.google.com/drive/1bFcbJQY9OuY0_zlscqkf9PIgd3dUrIKs?usp=sharing)
- **Stage 2 Model**: [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)

## πŸ“– Citation

```bibtex
@inproceedings{ould-ouali2025_improving,
  title     = {Improving Synthetic Speech Quality via SSML Prosody Control},
  author    = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
  booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
  year      = {2025},
  url       = {https://huggingface.co/hi-paris}
}
```

## πŸ“œ License

Apache 2.0 License (same as the base Qwen2.5-7B model)