File size: 4,880 Bytes
0b6cc78 5895355 0b6cc78 5895355 0b6cc78 5895355 9df7e0a 5895355 9df7e0a 5895355 1b839a9 5895355 1b839a9 5895355 1b839a9 5895355 1b839a9 5895355 1b839a9 5895355 1b839a9 5895355 1b839a9 5895355 0b6cc78 5895355 0b6cc78 5895355 0b6cc78 5895355 0b6cc78 5895355 0b6cc78 5895355 0b6cc78 5895355 0b6cc78 5895355 7533707 5895355 7533707 5895355 7533707 687f066 5895355 687f066 5895355 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
license: apache-2.0
base_model: Qwen/Qwen2.5-7B
library_name: peft
language:
- fr
tags:
- text-to-speech
- lora
- peft
- ssml
- qwen2.5
pipeline_tag: text-generation
---
# π£οΈ French Text-to-Breaks LoRA Model
**hi-paris/ssml-text2breaks-fr-lora** is a LoRA adapter fine-tuned on Qwen2.5-7B to predict natural pause locations in French text by adding symbolic `<break/>` markers.
This is the **first stage** of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.
> π **Paper**: *"Improving Synthetic Speech Quality via SSML Prosody Control"*
> **Authors**: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines
> **Conference**: ICNLSP 2025
> π **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/
## π§© Pipeline Overview
| Stage | Model | Purpose |
|-------|-------|---------|
| 1οΈβ£ | **hi-paris/ssml-text2breaks-fr-lora** | Predicts natural pause locations |
| 2οΈβ£ | [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora) | Converts breaks to full SSML with prosody |
## β¨ Example
**Input:**
```
Bonjour comment allez-vous aujourd'hui ?
```
**Output:**
```
Bonjour comment allez-vous aujourd'hui ?<break/>
```
## π Quick Start
### Installation
```bash
pip install torch transformers peft accelerate
```
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-text2breaks-fr-lora")
# Prepare input
text = "Bonjour comment allez-vous aujourd'hui ?"
formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text}\n\n### SSML:\n"
# Generate
inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
result = response.split("### SSML:\n")[-1].strip()
print(result) # "Bonjour comment allez-vous aujourd'hui ?<break/>"
```
### Production Usage (Recommended)
For production use with memory optimization and full cascade, see our [inference repository](https://github.com/TimLukaHorstmann/cascading_model):
```python
from text2breaks_inference import Text2BreaksInference
# Memory-efficient shared model approach
model = Text2BreaksInference()
result = model.predict("Bonjour comment allez-vous aujourd'hui ?")
```
## π§ Full Cascade Example
```python
from breaks2ssml_inference import CascadedInference
# Initialize full pipeline (memory efficient)
cascade = CascadedInference()
# Convert plain text directly to full SSML
text = "Bonjour comment allez-vous aujourd'hui ?"
ssml_output = cascade.predict(text)
print(ssml_output)
# Output: '<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>'
```
## π§ Model Details
- **Base Model**: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **LoRA Rank**: 8, Alpha: 16
- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
- **Training**: 5 epochs, batch size 1 with gradient accumulation
- **Language**: French
- **Model Size**: 7B parameters (LoRA adapter: ~81MB)
- **License**: Apache 2.0
## π Performance
The model achieves high accuracy in predicting natural pause locations in French text, contributing to improved prosody in text-to-speech synthesis when combined with the second-stage model.
## π Resources
- **Full Pipeline Code**: https://github.com/TimLukaHorstmann/cascading_model
- **Interactive Demo**: [Colab Notebook](https://colab.research.google.com/drive/1bFcbJQY9OuY0_zlscqkf9PIgd3dUrIKs?usp=sharing)
- **Stage 2 Model**: [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)
## π Citation
```bibtex
@inproceedings{ould-ouali2025_improving,
title = {Improving Synthetic Speech Quality via SSML Prosody Control},
author = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
year = {2025},
url = {https://huggingface.co/hi-paris}
}
```
## π License
Apache 2.0 License (same as the base Qwen2.5-7B model)
|