alvanlii
/

canto-llasa-1b

Model card Files Files and versions Community

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Work In Progress

This is a finetuned checkpoint of HKUSTAudio/Llasa-1B-Multilingual, on Cantonese audio data
Two additional tokens are added <|YUE_START|> and <|YUE_END|>. The chat template is

formatted_text = f"<|TEXT_UNDERSTANDING_START|><|YUE_START|>{input_text}<|YUE_END|><|TEXT_UNDERSTANDING_END|>"

chat = [
    {"role": "user", "content": "Convert the text to speech:" + formatted_text},
    {"role": "assistant", "content": "<|SPEECH_GENERATION_START|>" + ''.join(speech_ids_prefix)}
]

Roadmap

Train on more data
Train with emotions, speaker characteristics (gender, age)
Benchmark with CER
Gradio space
Train with LayerSkip
Train on better filtered data
Release training code

Downloads last month: 6

Safetensors

Model size

1.37B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alvanlii/canto-llasa-1b

Quantizations

1 model