alvanlii
/

canto-llasa-1b

Model card Files Files and versions Community

canto-llasa-1b / README.md

alvanlii's picture

Update README.md

62f8d3b verified about 2 months ago

|

history blame contribute delete

846 Bytes

	## Work In Progress

	- This is a finetuned checkpoint of [HKUSTAudio/Llasa-1B-Multilingual](https://huggingface.co/HKUSTAudio/Llasa-1B-Multilingual), on Cantonese audio data
	- Two additional tokens are added `<\|YUE_START\|>` and `<\|YUE_END\|>`. The chat template is
	```
	formatted_text = f"<\|TEXT_UNDERSTANDING_START\|><\|YUE_START\|>{input_text}<\|YUE_END\|><\|TEXT_UNDERSTANDING_END\|>"

	chat = [
	{"role": "user", "content": "Convert the text to speech:" + formatted_text},
	{"role": "assistant", "content": "<\|SPEECH_GENERATION_START\|>" + ''.join(speech_ids_prefix)}
	]
	```

	## Roadmap
	- [ ] Train on more data
	- [ ] Train with emotions, speaker characteristics (gender, age)
	- [ ] Benchmark with CER
	- [ ] Gradio space
	- [ ] Train with [LayerSkip](https://arxiv.org/abs/2404.16710)
	- [ ] Train on better filtered data
	- [ ] Release training code