File size: 846 Bytes
6fb8129 62f8d3b 58a1d18 6fb8129 58a1d18 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
## Work In Progress
- This is a finetuned checkpoint of [HKUSTAudio/Llasa-1B-Multilingual](https://huggingface.co/HKUSTAudio/Llasa-1B-Multilingual), on Cantonese audio data
- Two additional tokens are added `<|YUE_START|>` and `<|YUE_END|>`. The chat template is
```
formatted_text = f"<|TEXT_UNDERSTANDING_START|><|YUE_START|>{input_text}<|YUE_END|><|TEXT_UNDERSTANDING_END|>"
chat = [
{"role": "user", "content": "Convert the text to speech:" + formatted_text},
{"role": "assistant", "content": "<|SPEECH_GENERATION_START|>" + ''.join(speech_ids_prefix)}
]
```
## Roadmap
- [ ] Train on more data
- [ ] Train with emotions, speaker characteristics (gender, age)
- [ ] Benchmark with CER
- [ ] Gradio space
- [ ] Train with [LayerSkip](https://arxiv.org/abs/2404.16710)
- [ ] Train on better filtered data
- [ ] Release training code |