YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Work In Progress

  • This is a finetuned checkpoint of HKUSTAudio/Llasa-1B-Multilingual, on Cantonese audio data
  • Two additional tokens are added <|YUE_START|> and <|YUE_END|>. The chat template is
formatted_text = f"<|TEXT_UNDERSTANDING_START|><|YUE_START|>{input_text}<|YUE_END|><|TEXT_UNDERSTANDING_END|>"

chat = [
    {"role": "user", "content": "Convert the text to speech:" + formatted_text},
    {"role": "assistant", "content": "<|SPEECH_GENERATION_START|>" + ''.join(speech_ids_prefix)}
]

Roadmap

  • Train on more data
  • Train with emotions, speaker characteristics (gender, age)
  • Benchmark with CER
  • Gradio space
  • Train with LayerSkip
  • Train on better filtered data
  • Release training code
Downloads last month
5
Safetensors
Model size
1.37B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for alvanlii/canto-llasa-1b

Quantizations
1 model