TimLukaHorstmann
commited on
Commit
·
486463d
1
Parent(s):
f8457de
Updated model + inference + model card + colab
Browse files- README.md +113 -94
- added_tokens.json +0 -24
- chat_template.jinja +0 -54
- merges.txt +0 -0
- notebook.ipynb +378 -0
- special_tokens_map.json +0 -31
- tokenizer.json +0 -3
- tokenizer_config.json +0 -207
- vocab.json +0 -0
README.md
CHANGED
@@ -1,153 +1,172 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
license: apache-2.0
|
5 |
base_model: Qwen/Qwen2.5-7B
|
6 |
library_name: peft
|
|
|
|
|
7 |
tags:
|
8 |
-
-
|
|
|
9 |
- ssml
|
10 |
-
-
|
11 |
- qwen2.5
|
12 |
-
-
|
13 |
-
|
14 |
-
|
15 |
-
---
|
16 |
-
|
17 |
-
# 🗣️ ssml-break2ssml-fr-lora
|
18 |
-
|
19 |
-
|
20 |
-
This is the second-stage LoRA adapter for **French SSML generation**, converting *pause-annotated text* into full SSML markup with `<break>` tags.
|
21 |
-
|
22 |
-
This model is part of the cascade described in the paper:
|
23 |
-
|
24 |
-
**"Improving French Synthetic Speech Quality via SSML Prosody Control"**
|
25 |
-
Nassima Ould-Ouali, Éric Moulines – *ICNLSP 2025 (Springer LNCS)* [accepted].
|
26 |
-
|
27 |
-
|
28 |
---
|
29 |
|
|
|
30 |
|
31 |
-
|
32 |
|
33 |
-
|
34 |
-
- **Adapter method**: LoRA (Low-Rank Adaptation via [`peft`](https://github.com/huggingface/peft))
|
35 |
-
- **LoRA rank**: 8 — **Alpha**: 16
|
36 |
-
- **Training**: 5 epochs, batch size 1 (gradient accumulation)
|
37 |
-
- **Languages**: French
|
38 |
-
- **Model size**: 7B (adapter-only)
|
39 |
-
- **License**: Apache 2.0
|
40 |
|
41 |
-
|
|
|
|
|
|
|
42 |
|
43 |
## 🧩 Pipeline Overview
|
44 |
|
45 |
-
|
46 |
-
|
47 |
-
|
|
48 |
-
|
49 |
-
| 1️⃣ | `nassimaODL/ssml-text2breaks-fr-lora` | Inserts symbolic pauses like `#250`, `#500` |
|
50 |
-
| 2️⃣ | `nassimaODL/ssml-break2ssml-fr-lora` | Converts symbols to `<break time="..."/>` SSML |
|
51 |
-
|
52 |
|
53 |
## ✨ Example
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
|
|
|
58 |
|
|
|
|
|
|
|
59 |
```
|
60 |
|
61 |
-
|
62 |
|
|
|
63 |
|
64 |
-
|
|
|
|
|
65 |
|
66 |
-
|
67 |
|
|
|
68 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
69 |
from peft import PeftModel
|
70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
|
72 |
-
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B", device_map="auto")
|
73 |
-
model = PeftModel.from_pretrained(base_model, "nassimaODL/ssml-break2ssml-fr-lora")
|
74 |
|
75 |
-
|
76 |
-
|
77 |
|
78 |
-
with
|
79 |
-
|
80 |
-
|
81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
```
|
83 |
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
## 🧪 Evaluation Summary
|
88 |
|
|
|
89 |
|
90 |
-
|
91 |
-
|
92 |
-
| Pause Insertion Accuracy | 87.3% |
|
93 |
-
| RMSE (pause duration) | 98.5 ms |
|
94 |
-
| MOS gain (vs. baseline) | +0.42 |
|
95 |
|
96 |
-
|
|
|
|
|
|
|
97 |
|
|
|
98 |
|
99 |
-
|
|
|
100 |
|
|
|
|
|
101 |
|
102 |
-
|
|
|
|
|
|
|
|
|
|
|
103 |
|
|
|
104 |
|
105 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
-
|
108 |
-
- Voice activity boundaries (via Auditok)
|
109 |
-
- F0 contour analysis (pitch dips before breaks)
|
110 |
-
- Syntactic cues (punctuation, conjunctions)
|
111 |
|
112 |
-
|
113 |
-
|
|
|
|
|
|
|
114 |
|
115 |
-
|
116 |
|
117 |
-
##
|
118 |
|
119 |
-
- **
|
120 |
-
- **
|
121 |
-
- **
|
122 |
-
- **Precision**: bf16
|
123 |
-
- **Max sequence length**: 768 tokens (256 input + 512 output)
|
124 |
-
- **Epochs**: 5
|
125 |
-
- **Optimizer**: AdamW (lr = 2e-4, no warmup)
|
126 |
-
- **LoRA target modules**:
|
127 |
-
`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
|
128 |
|
129 |
-
|
130 |
|
131 |
-
|
|
|
|
|
132 |
|
133 |
-
##
|
134 |
|
135 |
-
-
|
136 |
-
-
|
137 |
-
-
|
138 |
-
- The model assumes the presence of symbolic pause markers in the input (e.g., `#250`). For automatic prediction of such symbols, refer to our stage-1 model:
|
139 |
-
🔗 [`nassimaODL/ssml-text2breaks-fr-lora`](https://huggingface.co/nassimaODL/ssml-text2breaks-fr-lora)
|
140 |
-
|
141 |
-
---
|
142 |
|
143 |
## 📖 Citation
|
|
|
|
|
144 |
@inproceedings{ould-ouali2025_improving,
|
145 |
title = {Improving Synthetic Speech Quality via SSML Prosody Control},
|
146 |
author = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
|
147 |
-
booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
|
148 |
year = {2025},
|
149 |
-
|
150 |
-
publisher = {—}, % TODO
|
151 |
-
address = {—} % TODO
|
152 |
}
|
|
|
|
|
|
|
153 |
|
|
|
|
1 |
---
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
base_model: Qwen/Qwen2.5-7B
|
4 |
library_name: peft
|
5 |
+
language:
|
6 |
+
- fr
|
7 |
tags:
|
8 |
+
- lora
|
9 |
+
- peft
|
10 |
- ssml
|
11 |
+
- text-to-speech
|
12 |
- qwen2.5
|
13 |
+
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
---
|
15 |
|
16 |
+
# 🗣️ French Breaks-to-SSML LoRA Model
|
17 |
|
18 |
+
**hi-paris/ssml-breaks2ssml-fr-lora** is a LoRA adapter fine-tuned on Qwen2.5-7B to convert text with symbolic `<break/>` markers into rich SSML markup with prosody control (pitch, rate, volume) and precise break timing.
|
19 |
|
20 |
+
This is the **second stage** of a two-step SSML cascade pipeline for improving French text-to-speech prosody control.
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
+
> 📄 **Paper**: *"Improving Synthetic Speech Quality via SSML Prosody Control"*
|
23 |
+
> **Authors**: Nassima Ould-Ouali, Awais Sani, Ruben Bueno, Jonah Dauvet, Tim Luka Horstmann, Eric Moulines
|
24 |
+
> **Conference**: ICNLSP 2025
|
25 |
+
> 🔗 **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/
|
26 |
|
27 |
## 🧩 Pipeline Overview
|
28 |
|
29 |
+
| Stage | Model | Purpose |
|
30 |
+
|-------|-------|---------|
|
31 |
+
| 1️⃣ | [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) | Predicts natural pause locations |
|
32 |
+
| 2️⃣ | **hi-paris/ssml-breaks2ssml-fr-lora** | Converts breaks to full SSML with prosody |
|
|
|
|
|
|
|
33 |
|
34 |
## ✨ Example
|
35 |
|
36 |
+
**Input:**
|
37 |
+
```
|
38 |
+
Bonjour comment allez-vous ?<break/>
|
39 |
+
```
|
40 |
|
41 |
+
**Output:**
|
42 |
+
```
|
43 |
+
<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous ?</prosody><break time="300ms"/>
|
44 |
```
|
45 |
|
46 |
+
## 🚀 Quick Start
|
47 |
|
48 |
+
### Installation
|
49 |
|
50 |
+
```bash
|
51 |
+
pip install torch transformers peft accelerate
|
52 |
+
```
|
53 |
|
54 |
+
### Basic Usage
|
55 |
|
56 |
+
```python
|
57 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
58 |
from peft import PeftModel
|
59 |
+
import torch
|
60 |
+
|
61 |
+
# Load base model and tokenizer
|
62 |
+
base_model = AutoModelForCausalLM.from_pretrained(
|
63 |
+
"Qwen/Qwen2.5-7B",
|
64 |
+
torch_dtype=torch.float16,
|
65 |
+
device_map="auto"
|
66 |
+
)
|
67 |
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")
|
|
|
|
|
68 |
|
69 |
+
# Load LoRA adapter
|
70 |
+
model = PeftModel.from_pretrained(base_model, "hi-paris/ssml-breaks2ssml-fr-lora")
|
71 |
|
72 |
+
# Prepare input (text with <break/> markers)
|
73 |
+
text_with_breaks = "Bonjour comment allez-vous ?<break/>"
|
74 |
+
formatted_input = f"### Task:\nConvert text to SSML with pauses:\n\n### Text:\n{text_with_breaks}\n\n### SSML:\n"
|
75 |
|
76 |
+
# Generate
|
77 |
+
inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
|
78 |
+
with torch.no_grad():
|
79 |
+
outputs = model.generate(
|
80 |
+
**inputs,
|
81 |
+
max_new_tokens=128,
|
82 |
+
temperature=0.3,
|
83 |
+
do_sample=False,
|
84 |
+
pad_token_id=tokenizer.eos_token_id
|
85 |
+
)
|
86 |
+
|
87 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
88 |
+
result = response.split("### SSML:\n")[-1].strip()
|
89 |
+
print(result)
|
90 |
```
|
91 |
|
92 |
+
### Production Usage (Recommended)
|
|
|
|
|
|
|
93 |
|
94 |
+
For production use with memory optimization, see our [inference repository](https://github.com/TimLukaHorstmann/cascading_model):
|
95 |
|
96 |
+
```python
|
97 |
+
from breaks2ssml_inference import Breaks2SSMLInference
|
|
|
|
|
|
|
98 |
|
99 |
+
# Memory-efficient shared model approach
|
100 |
+
model = Breaks2SSMLInference()
|
101 |
+
result = model.predict("Bonjour comment allez-vous ?<break/>")
|
102 |
+
```
|
103 |
|
104 |
+
## 🔧 Full Cascade Example
|
105 |
|
106 |
+
```python
|
107 |
+
from breaks2ssml_inference import CascadedInference
|
108 |
|
109 |
+
# Initialize full pipeline (memory efficient - single base model)
|
110 |
+
cascade = CascadedInference()
|
111 |
|
112 |
+
# Convert plain text directly to full SSML
|
113 |
+
text = "Bonjour comment allez-vous aujourd'hui ?"
|
114 |
+
ssml_output = cascade.predict(text)
|
115 |
+
print(ssml_output)
|
116 |
+
# Output: '<prosody pitch="+2.5%" rate="-1.2%" volume="-5.0%">Bonjour comment allez-vous aujourd'hui ?</prosody><break time="300ms"/>'
|
117 |
+
```
|
118 |
|
119 |
+
## 🧠 Model Details
|
120 |
|
121 |
+
- **Base Model**: [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
|
122 |
+
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
|
123 |
+
- **LoRA Rank**: 8, Alpha: 16
|
124 |
+
- **Target Modules**: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
|
125 |
+
- **Training**: 5 epochs, batch size 1 with gradient accumulation
|
126 |
+
- **Language**: French
|
127 |
+
- **Model Size**: 7B parameters (LoRA adapter: ~81MB)
|
128 |
+
- **License**: Apache 2.0
|
129 |
|
130 |
+
## 📊 Performance
|
|
|
|
|
|
|
131 |
|
132 |
+
| Metric | Score |
|
133 |
+
|--------|-------|
|
134 |
+
| Pause Insertion Accuracy | 87.3% |
|
135 |
+
| RMSE (pause duration) | 98.5 ms |
|
136 |
+
| MOS gain (vs. baseline) | +0.42 |
|
137 |
|
138 |
+
*Evaluation performed on held-out French validation set with annotated SSML pauses. Mean Opinion Score (MOS) improvements assessed using TTS outputs with Azure Henri voice, rated by 30 native French speakers.*
|
139 |
|
140 |
+
## 🎯 SSML Features Generated
|
141 |
|
142 |
+
- **Prosody Control**: Dynamic pitch, rate, and volume adjustments
|
143 |
+
- **Break Timing**: Precise pause durations (e.g., `<break time="300ms"/>`)
|
144 |
+
- **Contextual Adaptation**: Prosody values adapted to semantic content
|
|
|
|
|
|
|
|
|
|
|
|
|
145 |
|
146 |
+
## ⚠️ Limitations
|
147 |
|
148 |
+
- Optimized primarily for Azure TTS voices (e.g., `fr-FR-HenriNeural`)
|
149 |
+
- Requires input text with `<break/>` markers (use Stage 1 model for automatic prediction)
|
150 |
+
- Currently supports break tags only (pitch/rate/volume via prosody wrapper)
|
151 |
|
152 |
+
## 🔗 Resources
|
153 |
|
154 |
+
- **Full Pipeline Code**: https://github.com/TimLukaHorstmann/cascading_model
|
155 |
+
- **Interactive Demo**: [Colab Notebook](https://colab.research.google.com/drive/1bFcbJQY9OuY0_zlscqkf9PIgd3dUrIKs?usp=sharing)
|
156 |
+
- **Stage 1 Model**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora)
|
|
|
|
|
|
|
|
|
157 |
|
158 |
## 📖 Citation
|
159 |
+
|
160 |
+
```bibtex
|
161 |
@inproceedings{ould-ouali2025_improving,
|
162 |
title = {Improving Synthetic Speech Quality via SSML Prosody Control},
|
163 |
author = {Ould-Ouali, Nassima and Sani, Awais and Bueno, Ruben and Dauvet, Jonah and Horstmann, Tim Luka and Moulines, Eric},
|
164 |
+
booktitle = {Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP)},
|
165 |
year = {2025},
|
166 |
+
url = {https://huggingface.co/hi-paris}
|
|
|
|
|
167 |
}
|
168 |
+
```
|
169 |
+
|
170 |
+
## 📜 License
|
171 |
|
172 |
+
Apache 2.0 License (same as the base Qwen2.5-7B model)
|
added_tokens.json
DELETED
@@ -1,24 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"</tool_call>": 151658,
|
3 |
-
"<tool_call>": 151657,
|
4 |
-
"<|box_end|>": 151649,
|
5 |
-
"<|box_start|>": 151648,
|
6 |
-
"<|endoftext|>": 151643,
|
7 |
-
"<|file_sep|>": 151664,
|
8 |
-
"<|fim_middle|>": 151660,
|
9 |
-
"<|fim_pad|>": 151662,
|
10 |
-
"<|fim_prefix|>": 151659,
|
11 |
-
"<|fim_suffix|>": 151661,
|
12 |
-
"<|im_end|>": 151645,
|
13 |
-
"<|im_start|>": 151644,
|
14 |
-
"<|image_pad|>": 151655,
|
15 |
-
"<|object_ref_end|>": 151647,
|
16 |
-
"<|object_ref_start|>": 151646,
|
17 |
-
"<|quad_end|>": 151651,
|
18 |
-
"<|quad_start|>": 151650,
|
19 |
-
"<|repo_name|>": 151663,
|
20 |
-
"<|video_pad|>": 151656,
|
21 |
-
"<|vision_end|>": 151653,
|
22 |
-
"<|vision_pad|>": 151654,
|
23 |
-
"<|vision_start|>": 151652
|
24 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
chat_template.jinja
DELETED
@@ -1,54 +0,0 @@
|
|
1 |
-
{%- if tools %}
|
2 |
-
{{- '<|im_start|>system\n' }}
|
3 |
-
{%- if messages[0]['role'] == 'system' %}
|
4 |
-
{{- messages[0]['content'] }}
|
5 |
-
{%- else %}
|
6 |
-
{{- 'You are a helpful assistant.' }}
|
7 |
-
{%- endif %}
|
8 |
-
{{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
9 |
-
{%- for tool in tools %}
|
10 |
-
{{- "\n" }}
|
11 |
-
{{- tool | tojson }}
|
12 |
-
{%- endfor %}
|
13 |
-
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
14 |
-
{%- else %}
|
15 |
-
{%- if messages[0]['role'] == 'system' %}
|
16 |
-
{{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
|
17 |
-
{%- else %}
|
18 |
-
{{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
|
19 |
-
{%- endif %}
|
20 |
-
{%- endif %}
|
21 |
-
{%- for message in messages %}
|
22 |
-
{%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
|
23 |
-
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
|
24 |
-
{%- elif message.role == "assistant" %}
|
25 |
-
{{- '<|im_start|>' + message.role }}
|
26 |
-
{%- if message.content %}
|
27 |
-
{{- '\n' + message.content }}
|
28 |
-
{%- endif %}
|
29 |
-
{%- for tool_call in message.tool_calls %}
|
30 |
-
{%- if tool_call.function is defined %}
|
31 |
-
{%- set tool_call = tool_call.function %}
|
32 |
-
{%- endif %}
|
33 |
-
{{- '\n<tool_call>\n{"name": "' }}
|
34 |
-
{{- tool_call.name }}
|
35 |
-
{{- '", "arguments": ' }}
|
36 |
-
{{- tool_call.arguments | tojson }}
|
37 |
-
{{- '}\n</tool_call>' }}
|
38 |
-
{%- endfor %}
|
39 |
-
{{- '<|im_end|>\n' }}
|
40 |
-
{%- elif message.role == "tool" %}
|
41 |
-
{%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
|
42 |
-
{{- '<|im_start|>user' }}
|
43 |
-
{%- endif %}
|
44 |
-
{{- '\n<tool_response>\n' }}
|
45 |
-
{{- message.content }}
|
46 |
-
{{- '\n</tool_response>' }}
|
47 |
-
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
48 |
-
{{- '<|im_end|>\n' }}
|
49 |
-
{%- endif %}
|
50 |
-
{%- endif %}
|
51 |
-
{%- endfor %}
|
52 |
-
{%- if add_generation_prompt %}
|
53 |
-
{{- '<|im_start|>assistant\n' }}
|
54 |
-
{%- endif %}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
merges.txt
DELETED
The diff for this file is too large to render.
See raw diff
|
|
notebook.ipynb
ADDED
@@ -0,0 +1,378 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {},
|
6 |
+
"source": [
|
7 |
+
"# French SSML Cascade Models Demo\n",
|
8 |
+
"\n",
|
9 |
+
"<img src=\"https://www.hi-paris.fr/wp-content/uploads/2020/09/logo-hi-paris-retina.png\" alt=\"Hi! Paris\" width=\"200\"/>\n",
|
10 |
+
"\n",
|
11 |
+
"**Interactive demonstration of French SSML cascade models for improved text-to-speech prosody control.**\n",
|
12 |
+
"\n",
|
13 |
+
"This notebook demonstrates the complete pipeline from plain French text to rich SSML markup with prosody control.\n",
|
14 |
+
"\n",
|
15 |
+
"## 🧩 Pipeline Overview\n",
|
16 |
+
"\n",
|
17 |
+
"1. **Text-to-Breaks**: Predicts natural pause locations \n",
|
18 |
+
"2. **Breaks-to-SSML**: Adds prosody control (pitch, rate, volume) and precise timing\n",
|
19 |
+
"\n",
|
20 |
+
"📄 **Paper**: *Improving Synthetic Speech Quality via SSML Prosody Control* (ICNLSP 2025) \n",
|
21 |
+
"🔗 **Demo & Audio Samples**: https://horstmann.tech/ssml-prosody-control/ \n",
|
22 |
+
"📚 **Models**: [hi-paris/ssml-text2breaks-fr-lora](https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora) • [hi-paris/ssml-breaks2ssml-fr-lora](https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora)\n",
|
23 |
+
"\n",
|
24 |
+
"---"
|
25 |
+
]
|
26 |
+
},
|
27 |
+
{
|
28 |
+
"cell_type": "markdown",
|
29 |
+
"metadata": {},
|
30 |
+
"source": [
|
31 |
+
"## 🚀 Setup\n",
|
32 |
+
"\n",
|
33 |
+
"### Step 1: Mount Google Drive"
|
34 |
+
]
|
35 |
+
},
|
36 |
+
{
|
37 |
+
"cell_type": "code",
|
38 |
+
"execution_count": 34,
|
39 |
+
"metadata": {
|
40 |
+
"colab": {
|
41 |
+
"base_uri": "https://localhost:8080/"
|
42 |
+
},
|
43 |
+
"id": "a1jNj9uK7EoL",
|
44 |
+
"outputId": "76624289-061f-4700-e397-50da9da9ee6d"
|
45 |
+
},
|
46 |
+
"outputs": [
|
47 |
+
{
|
48 |
+
"name": "stdout",
|
49 |
+
"output_type": "stream",
|
50 |
+
"text": [
|
51 |
+
"Mounted at /content/drive\n"
|
52 |
+
]
|
53 |
+
}
|
54 |
+
],
|
55 |
+
"source": [
|
56 |
+
"from google.colab import drive\n",
|
57 |
+
"drive.mount('/content/drive', force_remount=True)"
|
58 |
+
]
|
59 |
+
},
|
60 |
+
{
|
61 |
+
"cell_type": "markdown",
|
62 |
+
"metadata": {},
|
63 |
+
"source": [
|
64 |
+
"### Step 2: Clone Repository"
|
65 |
+
]
|
66 |
+
},
|
67 |
+
{
|
68 |
+
"cell_type": "code",
|
69 |
+
"execution_count": 35,
|
70 |
+
"metadata": {
|
71 |
+
"colab": {
|
72 |
+
"base_uri": "https://localhost:8080/"
|
73 |
+
},
|
74 |
+
"id": "eE3iUaX_7OLG",
|
75 |
+
"outputId": "d621b296-b12f-489a-bc1f-c7240c21646b"
|
76 |
+
},
|
77 |
+
"outputs": [
|
78 |
+
{
|
79 |
+
"name": "stderr",
|
80 |
+
"output_type": "stream",
|
81 |
+
"text": [
|
82 |
+
"shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
|
83 |
+
"chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory\n",
|
84 |
+
"Cloning into 'cascading_model'...\n"
|
85 |
+
]
|
86 |
+
}
|
87 |
+
],
|
88 |
+
"source": [
|
89 |
+
"%%bash\n",
|
90 |
+
"cd /content/drive/MyDrive/\n",
|
91 |
+
"git clone https://github.com/TimLukaHorstmann/cascading_model.git"
|
92 |
+
]
|
93 |
+
},
|
94 |
+
{
|
95 |
+
"cell_type": "code",
|
96 |
+
"execution_count": 36,
|
97 |
+
"metadata": {
|
98 |
+
"colab": {
|
99 |
+
"base_uri": "https://localhost:8080/"
|
100 |
+
},
|
101 |
+
"id": "vItNbMvh7ZNL",
|
102 |
+
"outputId": "31a31144-1261-4427-9d2e-089ae17689b2"
|
103 |
+
},
|
104 |
+
"outputs": [
|
105 |
+
{
|
106 |
+
"name": "stdout",
|
107 |
+
"output_type": "stream",
|
108 |
+
"text": [
|
109 |
+
"/content/drive/MyDrive/cascading_model\n"
|
110 |
+
]
|
111 |
+
}
|
112 |
+
],
|
113 |
+
"source": [
|
114 |
+
"%cd /content/drive/MyDrive/cascading_model/\n"
|
115 |
+
]
|
116 |
+
},
|
117 |
+
{
|
118 |
+
"cell_type": "code",
|
119 |
+
"execution_count": 37,
|
120 |
+
"metadata": {
|
121 |
+
"colab": {
|
122 |
+
"base_uri": "https://localhost:8080/"
|
123 |
+
},
|
124 |
+
"id": "JdeuCOX_7kae",
|
125 |
+
"outputId": "f8bad5e1-92d0-4531-fbe0-ca2f29a8efd8"
|
126 |
+
},
|
127 |
+
"outputs": [
|
128 |
+
{
|
129 |
+
"name": "stdout",
|
130 |
+
"output_type": "stream",
|
131 |
+
"text": [
|
132 |
+
"breaks2ssml_inference.py\n",
|
133 |
+
"demo.py\n",
|
134 |
+
"empty_ssml_creation.py\n",
|
135 |
+
"__init__.py\n",
|
136 |
+
"pyproject.toml\n",
|
137 |
+
"README.md\n",
|
138 |
+
"requirements.txt\n",
|
139 |
+
"shared_models.py\n",
|
140 |
+
"test_models.py\n",
|
141 |
+
"text2breaks_inference.py\n"
|
142 |
+
]
|
143 |
+
}
|
144 |
+
],
|
145 |
+
"source": [
|
146 |
+
"%%bash\n",
|
147 |
+
"ls"
|
148 |
+
]
|
149 |
+
},
|
150 |
+
{
|
151 |
+
"cell_type": "markdown",
|
152 |
+
"metadata": {},
|
153 |
+
"source": [
|
154 |
+
"## 🧪 Testing & Demo\n",
|
155 |
+
"\n",
|
156 |
+
"### Step 3: Verify Installation"
|
157 |
+
]
|
158 |
+
},
|
159 |
+
{
|
160 |
+
"cell_type": "code",
|
161 |
+
"execution_count": 38,
|
162 |
+
"metadata": {
|
163 |
+
"colab": {
|
164 |
+
"base_uri": "https://localhost:8080/"
|
165 |
+
},
|
166 |
+
"id": "eaBx_eh-819B",
|
167 |
+
"outputId": "2c55f4fa-f17e-49b8-b032-74d670dcd34a"
|
168 |
+
},
|
169 |
+
"outputs": [
|
170 |
+
{
|
171 |
+
"name": "stdout",
|
172 |
+
"output_type": "stream",
|
173 |
+
"text": [
|
174 |
+
"2025-08-06 12:36:48.453347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
|
175 |
+
"WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
|
176 |
+
"E0000 00:00:1754483808.475278 35366 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
|
177 |
+
"E0000 00:00:1754483808.481612 35366 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
|
178 |
+
"============================================================\n",
|
179 |
+
"🧪 French SSML Models - Test Suite\n",
|
180 |
+
"============================================================\n",
|
181 |
+
"🔍 Testing imports...\n",
|
182 |
+
" ✅ PyTorch 2.5.1+cu121\n",
|
183 |
+
" ✅ Transformers 4.54.0\n",
|
184 |
+
" ✅ PEFT 0.16.0\n",
|
185 |
+
" ✅ All imports successful!\n",
|
186 |
+
"\n",
|
187 |
+
"🔧 Testing model loading...\n",
|
188 |
+
" Loading text2breaks model...\n",
|
189 |
+
"Loading checkpoint shards: 100% 4/4 [01:33<00:00, 23.46s/it]\n",
|
190 |
+
" ✅ Text2breaks model loaded\n",
|
191 |
+
" Loading breaks2ssml model...\n",
|
192 |
+
" ✅ Breaks2ssml model loaded\n",
|
193 |
+
" ✅ All models loaded successfully!\n",
|
194 |
+
"\n",
|
195 |
+
"🧪 Testing inference...\n",
|
196 |
+
" Input: Bonjour comment allez-vous ?\n",
|
197 |
+
" Testing text2breaks...\n",
|
198 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
199 |
+
" Step 1 result: Bonjour comment allez-vous ?<break/>\n",
|
200 |
+
" Testing breaks2ssml...\n",
|
201 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
202 |
+
" Step 2 result: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
|
203 |
+
" Bonjour comment allez-vous ?\n",
|
204 |
+
" </prosody>\n",
|
205 |
+
" <break time=\"500ms\"/>\n",
|
206 |
+
" ✅ Inference test successful!\n",
|
207 |
+
"\n",
|
208 |
+
"🔗 Testing full cascade...\n",
|
209 |
+
" Input: Bonsoir comment ça va ?\n",
|
210 |
+
" Cascade result: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
|
211 |
+
" Bonsoir comment ça va ?\n",
|
212 |
+
" </prosody>\n",
|
213 |
+
" <break time=\"500ms\"/>\n",
|
214 |
+
" ✅ Cascade test successful!\n",
|
215 |
+
"\n",
|
216 |
+
"============================================================\n",
|
217 |
+
"🎉 All tests passed! The models are working correctly.\n",
|
218 |
+
"============================================================\n",
|
219 |
+
"\n",
|
220 |
+
"You can now use:\n",
|
221 |
+
" - python demo.py (for examples)\n",
|
222 |
+
" - python demo.py --interactive (for interactive mode)\n",
|
223 |
+
" - python text2breaks_inference.py --interactive\n",
|
224 |
+
" - python breaks2ssml_inference.py --interactive\n"
|
225 |
+
]
|
226 |
+
}
|
227 |
+
],
|
228 |
+
"source": [
|
229 |
+
"!python test_models.py"
|
230 |
+
]
|
231 |
+
},
|
232 |
+
{
|
233 |
+
"cell_type": "markdown",
|
234 |
+
"metadata": {},
|
235 |
+
"source": [
|
236 |
+
"### Step 4: Interactive Demo\n",
|
237 |
+
"\n",
|
238 |
+
"Run the interactive demo to test the models with your own French text:"
|
239 |
+
]
|
240 |
+
},
|
241 |
+
{
|
242 |
+
"cell_type": "code",
|
243 |
+
"execution_count": 29,
|
244 |
+
"metadata": {
|
245 |
+
"colab": {
|
246 |
+
"base_uri": "https://localhost:8080/"
|
247 |
+
},
|
248 |
+
"id": "ZIeUY9atUhvV",
|
249 |
+
"outputId": "581f1395-fa70-424f-9c66-50b5e44547c3"
|
250 |
+
},
|
251 |
+
"outputs": [
|
252 |
+
{
|
253 |
+
"name": "stdout",
|
254 |
+
"output_type": "stream",
|
255 |
+
"text": [
|
256 |
+
"2025-08-06 12:21:35.541051: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
|
257 |
+
"WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
|
258 |
+
"E0000 00:00:1754482895.561958 31169 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
|
259 |
+
"E0000 00:00:1754482895.568312 31169 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
|
260 |
+
"================================================================================\n",
|
261 |
+
"Interactive French SSML Cascade\n",
|
262 |
+
"================================================================================\n",
|
263 |
+
"\n",
|
264 |
+
"Choose mode:\n",
|
265 |
+
"1. Full cascade (text → breaks → SSML)\n",
|
266 |
+
"2. Text to breaks only\n",
|
267 |
+
"3. Breaks to SSML only\n",
|
268 |
+
"\n",
|
269 |
+
"Select mode (1-3): 1\n",
|
270 |
+
"\n",
|
271 |
+
"Initializing models...\n",
|
272 |
+
"Loading checkpoint shards: 100% 4/4 [01:30<00:00, 22.70s/it]\n",
|
273 |
+
"Models loaded successfully!\n",
|
274 |
+
"\n",
|
275 |
+
"Enter French text (empty line to exit):\n",
|
276 |
+
"\n",
|
277 |
+
"> Je suis Luka.\n",
|
278 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
279 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
280 |
+
"Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
|
281 |
+
" Je suis Luka.\n",
|
282 |
+
" </prosody>\n",
|
283 |
+
" <break time=\"500ms\"/>\n",
|
284 |
+
"Time: 6.55s\n",
|
285 |
+
"\n",
|
286 |
+
"> Trés bien.\n",
|
287 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
288 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
289 |
+
"Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
|
290 |
+
" Trés bien.\n",
|
291 |
+
" </prosody>\n",
|
292 |
+
" <break time=\"500ms\"/>\n",
|
293 |
+
"Time: 5.64s\n",
|
294 |
+
"\n",
|
295 |
+
"> Je suis Bertrand Perier. Je suis avocat et vous écoutez ma masterclass.\n",
|
296 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
297 |
+
"The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.\n",
|
298 |
+
"Output: <prosody pitch=\"+0.64%\" rate=\"-1.92%\" volume=\"-10.00%\">\n",
|
299 |
+
" Je suis Bertrand Perier.\n",
|
300 |
+
" </prosody>\n",
|
301 |
+
" <break time=\"500ms\"/>\n",
|
302 |
+
"\n",
|
303 |
+
" <prosody pitch=\"+3.78%\" rate=\"-1.29%\" volume=\"-10.00%\">\n",
|
304 |
+
" Je suis avocat et vous écoutez ma masterclass.\n",
|
305 |
+
" </prosody>\n",
|
306 |
+
" <break time=\"500ms\"/>\n",
|
307 |
+
"Time: 12.11s\n",
|
308 |
+
"\n",
|
309 |
+
"> Exception ignored in: <module 'threading' from '/usr/lib/python3.11/threading.py'>\n",
|
310 |
+
"Traceback (most recent call last):\n",
|
311 |
+
" File \"/usr/lib/python3.11/threading.py\", line 1541, in _shutdown\n",
|
312 |
+
" def _shutdown():\n",
|
313 |
+
" \n",
|
314 |
+
"KeyboardInterrupt: \n"
|
315 |
+
]
|
316 |
+
}
|
317 |
+
],
|
318 |
+
"source": [
|
319 |
+
"!python demo.py --interactive"
|
320 |
+
]
|
321 |
+
},
|
322 |
+
{
|
323 |
+
"cell_type": "markdown",
|
324 |
+
"metadata": {},
|
325 |
+
"source": [
|
326 |
+
"## 🎯 Example Usage\n",
|
327 |
+
"\n",
|
328 |
+
"```python\n",
|
329 |
+
"from breaks2ssml_inference import CascadedInference\n",
|
330 |
+
"\n",
|
331 |
+
"# Initialize the full cascade\n",
|
332 |
+
"cascade = CascadedInference()\n",
|
333 |
+
"\n",
|
334 |
+
"# Convert plain French text to SSML\n",
|
335 |
+
"text = \"Bonjour comment allez-vous aujourd'hui ?\"\n",
|
336 |
+
"result = cascade.predict(text)\n",
|
337 |
+
"print(result)\n",
|
338 |
+
"```\n",
|
339 |
+
"\n",
|
340 |
+
"**Expected Output:**\n",
|
341 |
+
"```xml\n",
|
342 |
+
"<prosody pitch=\"+2.5%\" rate=\"-1.2%\" volume=\"-5.0%\">Bonjour comment allez-vous aujourd'hui ?</prosody><break time=\"300ms\"/>\n",
|
343 |
+
"```\n",
|
344 |
+
"\n",
|
345 |
+
"## 📚 Resources\n",
|
346 |
+
"\n",
|
347 |
+
"- **Audio Demos**: https://horstmann.tech/ssml-prosody-control/\n",
|
348 |
+
"- **GitHub Repository**: https://github.com/TimLukaHorstmann/cascading_model\n",
|
349 |
+
"- **Stage 1 Model**: https://huggingface.co/hi-paris/ssml-text2breaks-fr-lora\n",
|
350 |
+
"- **Stage 2 Model**: https://huggingface.co/hi-paris/ssml-breaks2ssml-fr-lora\n",
|
351 |
+
"\n",
|
352 |
+
"---\n",
|
353 |
+
"*Hi! Paris - Interdisciplinary Research Institute for Artificial Intelligence*"
|
354 |
+
]
|
355 |
+
},
|
356 |
+
{
|
357 |
+
"cell_type": "markdown",
|
358 |
+
"metadata": {},
|
359 |
+
"source": []
|
360 |
+
}
|
361 |
+
],
|
362 |
+
"metadata": {
|
363 |
+
"accelerator": "GPU",
|
364 |
+
"colab": {
|
365 |
+
"gpuType": "T4",
|
366 |
+
"provenance": []
|
367 |
+
},
|
368 |
+
"kernelspec": {
|
369 |
+
"display_name": "Python 3",
|
370 |
+
"name": "python3"
|
371 |
+
},
|
372 |
+
"language_info": {
|
373 |
+
"name": "python"
|
374 |
+
}
|
375 |
+
},
|
376 |
+
"nbformat": 4,
|
377 |
+
"nbformat_minor": 0
|
378 |
+
}
|
special_tokens_map.json
DELETED
@@ -1,31 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"additional_special_tokens": [
|
3 |
-
"<|im_start|>",
|
4 |
-
"<|im_end|>",
|
5 |
-
"<|object_ref_start|>",
|
6 |
-
"<|object_ref_end|>",
|
7 |
-
"<|box_start|>",
|
8 |
-
"<|box_end|>",
|
9 |
-
"<|quad_start|>",
|
10 |
-
"<|quad_end|>",
|
11 |
-
"<|vision_start|>",
|
12 |
-
"<|vision_end|>",
|
13 |
-
"<|vision_pad|>",
|
14 |
-
"<|image_pad|>",
|
15 |
-
"<|video_pad|>"
|
16 |
-
],
|
17 |
-
"eos_token": {
|
18 |
-
"content": "<|endoftext|>",
|
19 |
-
"lstrip": false,
|
20 |
-
"normalized": false,
|
21 |
-
"rstrip": false,
|
22 |
-
"single_word": false
|
23 |
-
},
|
24 |
-
"pad_token": {
|
25 |
-
"content": "<|endoftext|>",
|
26 |
-
"lstrip": false,
|
27 |
-
"normalized": false,
|
28 |
-
"rstrip": false,
|
29 |
-
"single_word": false
|
30 |
-
}
|
31 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tokenizer.json
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
|
3 |
-
size 11421896
|
|
|
|
|
|
|
|
tokenizer_config.json
DELETED
@@ -1,207 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"add_bos_token": false,
|
3 |
-
"add_prefix_space": false,
|
4 |
-
"added_tokens_decoder": {
|
5 |
-
"151643": {
|
6 |
-
"content": "<|endoftext|>",
|
7 |
-
"lstrip": false,
|
8 |
-
"normalized": false,
|
9 |
-
"rstrip": false,
|
10 |
-
"single_word": false,
|
11 |
-
"special": true
|
12 |
-
},
|
13 |
-
"151644": {
|
14 |
-
"content": "<|im_start|>",
|
15 |
-
"lstrip": false,
|
16 |
-
"normalized": false,
|
17 |
-
"rstrip": false,
|
18 |
-
"single_word": false,
|
19 |
-
"special": true
|
20 |
-
},
|
21 |
-
"151645": {
|
22 |
-
"content": "<|im_end|>",
|
23 |
-
"lstrip": false,
|
24 |
-
"normalized": false,
|
25 |
-
"rstrip": false,
|
26 |
-
"single_word": false,
|
27 |
-
"special": true
|
28 |
-
},
|
29 |
-
"151646": {
|
30 |
-
"content": "<|object_ref_start|>",
|
31 |
-
"lstrip": false,
|
32 |
-
"normalized": false,
|
33 |
-
"rstrip": false,
|
34 |
-
"single_word": false,
|
35 |
-
"special": true
|
36 |
-
},
|
37 |
-
"151647": {
|
38 |
-
"content": "<|object_ref_end|>",
|
39 |
-
"lstrip": false,
|
40 |
-
"normalized": false,
|
41 |
-
"rstrip": false,
|
42 |
-
"single_word": false,
|
43 |
-
"special": true
|
44 |
-
},
|
45 |
-
"151648": {
|
46 |
-
"content": "<|box_start|>",
|
47 |
-
"lstrip": false,
|
48 |
-
"normalized": false,
|
49 |
-
"rstrip": false,
|
50 |
-
"single_word": false,
|
51 |
-
"special": true
|
52 |
-
},
|
53 |
-
"151649": {
|
54 |
-
"content": "<|box_end|>",
|
55 |
-
"lstrip": false,
|
56 |
-
"normalized": false,
|
57 |
-
"rstrip": false,
|
58 |
-
"single_word": false,
|
59 |
-
"special": true
|
60 |
-
},
|
61 |
-
"151650": {
|
62 |
-
"content": "<|quad_start|>",
|
63 |
-
"lstrip": false,
|
64 |
-
"normalized": false,
|
65 |
-
"rstrip": false,
|
66 |
-
"single_word": false,
|
67 |
-
"special": true
|
68 |
-
},
|
69 |
-
"151651": {
|
70 |
-
"content": "<|quad_end|>",
|
71 |
-
"lstrip": false,
|
72 |
-
"normalized": false,
|
73 |
-
"rstrip": false,
|
74 |
-
"single_word": false,
|
75 |
-
"special": true
|
76 |
-
},
|
77 |
-
"151652": {
|
78 |
-
"content": "<|vision_start|>",
|
79 |
-
"lstrip": false,
|
80 |
-
"normalized": false,
|
81 |
-
"rstrip": false,
|
82 |
-
"single_word": false,
|
83 |
-
"special": true
|
84 |
-
},
|
85 |
-
"151653": {
|
86 |
-
"content": "<|vision_end|>",
|
87 |
-
"lstrip": false,
|
88 |
-
"normalized": false,
|
89 |
-
"rstrip": false,
|
90 |
-
"single_word": false,
|
91 |
-
"special": true
|
92 |
-
},
|
93 |
-
"151654": {
|
94 |
-
"content": "<|vision_pad|>",
|
95 |
-
"lstrip": false,
|
96 |
-
"normalized": false,
|
97 |
-
"rstrip": false,
|
98 |
-
"single_word": false,
|
99 |
-
"special": true
|
100 |
-
},
|
101 |
-
"151655": {
|
102 |
-
"content": "<|image_pad|>",
|
103 |
-
"lstrip": false,
|
104 |
-
"normalized": false,
|
105 |
-
"rstrip": false,
|
106 |
-
"single_word": false,
|
107 |
-
"special": true
|
108 |
-
},
|
109 |
-
"151656": {
|
110 |
-
"content": "<|video_pad|>",
|
111 |
-
"lstrip": false,
|
112 |
-
"normalized": false,
|
113 |
-
"rstrip": false,
|
114 |
-
"single_word": false,
|
115 |
-
"special": true
|
116 |
-
},
|
117 |
-
"151657": {
|
118 |
-
"content": "<tool_call>",
|
119 |
-
"lstrip": false,
|
120 |
-
"normalized": false,
|
121 |
-
"rstrip": false,
|
122 |
-
"single_word": false,
|
123 |
-
"special": false
|
124 |
-
},
|
125 |
-
"151658": {
|
126 |
-
"content": "</tool_call>",
|
127 |
-
"lstrip": false,
|
128 |
-
"normalized": false,
|
129 |
-
"rstrip": false,
|
130 |
-
"single_word": false,
|
131 |
-
"special": false
|
132 |
-
},
|
133 |
-
"151659": {
|
134 |
-
"content": "<|fim_prefix|>",
|
135 |
-
"lstrip": false,
|
136 |
-
"normalized": false,
|
137 |
-
"rstrip": false,
|
138 |
-
"single_word": false,
|
139 |
-
"special": false
|
140 |
-
},
|
141 |
-
"151660": {
|
142 |
-
"content": "<|fim_middle|>",
|
143 |
-
"lstrip": false,
|
144 |
-
"normalized": false,
|
145 |
-
"rstrip": false,
|
146 |
-
"single_word": false,
|
147 |
-
"special": false
|
148 |
-
},
|
149 |
-
"151661": {
|
150 |
-
"content": "<|fim_suffix|>",
|
151 |
-
"lstrip": false,
|
152 |
-
"normalized": false,
|
153 |
-
"rstrip": false,
|
154 |
-
"single_word": false,
|
155 |
-
"special": false
|
156 |
-
},
|
157 |
-
"151662": {
|
158 |
-
"content": "<|fim_pad|>",
|
159 |
-
"lstrip": false,
|
160 |
-
"normalized": false,
|
161 |
-
"rstrip": false,
|
162 |
-
"single_word": false,
|
163 |
-
"special": false
|
164 |
-
},
|
165 |
-
"151663": {
|
166 |
-
"content": "<|repo_name|>",
|
167 |
-
"lstrip": false,
|
168 |
-
"normalized": false,
|
169 |
-
"rstrip": false,
|
170 |
-
"single_word": false,
|
171 |
-
"special": false
|
172 |
-
},
|
173 |
-
"151664": {
|
174 |
-
"content": "<|file_sep|>",
|
175 |
-
"lstrip": false,
|
176 |
-
"normalized": false,
|
177 |
-
"rstrip": false,
|
178 |
-
"single_word": false,
|
179 |
-
"special": false
|
180 |
-
}
|
181 |
-
},
|
182 |
-
"additional_special_tokens": [
|
183 |
-
"<|im_start|>",
|
184 |
-
"<|im_end|>",
|
185 |
-
"<|object_ref_start|>",
|
186 |
-
"<|object_ref_end|>",
|
187 |
-
"<|box_start|>",
|
188 |
-
"<|box_end|>",
|
189 |
-
"<|quad_start|>",
|
190 |
-
"<|quad_end|>",
|
191 |
-
"<|vision_start|>",
|
192 |
-
"<|vision_end|>",
|
193 |
-
"<|vision_pad|>",
|
194 |
-
"<|image_pad|>",
|
195 |
-
"<|video_pad|>"
|
196 |
-
],
|
197 |
-
"bos_token": null,
|
198 |
-
"clean_up_tokenization_spaces": false,
|
199 |
-
"eos_token": "<|endoftext|>",
|
200 |
-
"errors": "replace",
|
201 |
-
"extra_special_tokens": {},
|
202 |
-
"model_max_length": 131072,
|
203 |
-
"pad_token": "<|endoftext|>",
|
204 |
-
"split_special_tokens": false,
|
205 |
-
"tokenizer_class": "Qwen2Tokenizer",
|
206 |
-
"unk_token": null
|
207 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
vocab.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|