alvanlii
/

canto-llasa-1b

Model card Files Files and versions Community

canto-llasa-1b / README.md

alvanlii's picture

Update README.md

62f8d3b verified about 1 month ago

|

history blame contribute delete

846 Bytes

Work In Progress

This is a finetuned checkpoint of HKUSTAudio/Llasa-1B-Multilingual, on Cantonese audio data
Two additional tokens are added <|YUE_START|> and <|YUE_END|>. The chat template is

formatted_text = f"<|TEXT_UNDERSTANDING_START|><|YUE_START|>{input_text}<|YUE_END|><|TEXT_UNDERSTANDING_END|>"

chat = [
    {"role": "user", "content": "Convert the text to speech:" + formatted_text},
    {"role": "assistant", "content": "<|SPEECH_GENERATION_START|>" + ''.join(speech_ids_prefix)}
]

Roadmap

Train on more data
Train with emotions, speaker characteristics (gender, age)
Benchmark with CER
Gradio space
Train with LayerSkip
Train on better filtered data
Release training code