π£οΈ French Text-to-Breaks LoRA Model
hi-paris/ssml-text2breaks-fr-lora is a LoRA adapter fine-tuned on Qwen2.5-7B to predict natural pause locations in French text by adding symbolic <break/>
markers.
This is the first stage of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.
π Paper: "Improving Synthetic Speech Quality via SSML Prosody Control"
Authors: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines
Conference: ICNLSP 2025
π Demo & Audio Samples: https://horstmann.tech/ssml-prosody-control/
π§© Pipeline Overview
Stage | Model | Purpose |
---|---|---|
1οΈβ£ | hi-paris/ssml-text2breaks-fr-lora | Predicts natural pause locations |
2οΈβ£ | hi-paris/ssml-breaks2ssml-fr-lora | Converts breaks to full SSML with prosody |
β¨ Example
Input:
Bonjour comment allez-vous aujourd'hui ?
Output:
Bonjour comment allez-vous aujourd'hui ?<break/>
π Quick Start
Installation
pip install torch transformers peft accelerate
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-text2breaks-fr-lora")
# Prepare input
text = "Bonjour comment allez-vous aujourd'hui ?"
formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text}\n\n### SSML:\n"
# Generate
inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
result = response.split("### SSML:\n")[-1].strip()
print(result) # "Bonjour comment allez-vous aujourd'hui ?<break/>"
Production Usage (Recommended)
For production use with memory optimization and full cascade, see our inference repository:
from text2breaks_inference import Text2BreaksInference
# Memory-efficient shared model approach
model = Text2BreaksInference()
result = model.predict("Bonjour comment allez-vous aujourd'hui ?")
π§ Full Cascade Example
from breaks2ssml_inference import CascadedInference
# Initialize full pipeline (memory efficient)
cascade = CascadedInference()
# Convert plain text directly to full SSML
text = "Bonjour comment allez-vous aujourd'hui ?"
ssml_output = cascade.predict(text)
print(ssml_output)
# Output: '<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>'
π§ Model Details
- Base Model: Qwen/Qwen2.5-7B
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 8, Alpha: 16
- Target Modules:
q_proj
,k_proj
,v_proj
,o_proj
,gate_proj
,up_proj
,down_proj
- Training: 5 epochs, batch size 1 with gradient accumulation
- Language: French
- Model Size: 7B parameters (LoRA adapter: ~81MB)
- License: Apache 2.0
π Performance
The model achieves high accuracy in predicting natural pause locations in French text, contributing to improved prosody in text-to-speech synthesis when combined with the second-stage model.
π Resources
- Full Pipeline Code: https://github.com/TimLukaHorstmann/cascading_model
- Interactive Demo: Colab Notebook
- Stage 2 Model: hi-paris/ssml-breaks2ssml-fr-lora
π Citation
@inproceedings{ould-ouali2025_improving,
title = {Improving Synthetic Speech Quality via SSML Prosody Control},
author = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
year = {2025},
url = {https://huggingface.co/hi-paris}
}
π License
Apache 2.0 License (same as the base Qwen2.5-7B model)
- Downloads last month
- 170
Model tree for hi-paris/ssml-text2breaks-fr-lora
Base model
Qwen/Qwen2.5-7B