--- library_name: transformers datasets: - mesolitica/tts-combine-annotated language: - ms --- # Malay Parler TTS Mini V1 Finetuned https://huggingface.co/parler-tts/parler-tts-mini-v1 on [Malay TTS dataset](https://huggingface.co/datasets/mesolitica/tts-combine-annotated) Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/parler-tts Wandb at https://wandb.ai/huseinzol05/parler-speech?nw=nwuserhuseinzol05 ## requirements ```bash pip3 install git+https://github.com/mesolitica/async-parler-tts ``` ## how to ```python import torch from parler_tts import ParlerTTSForConditionalGeneration from transformers import AutoTokenizer import soundfile as sf device = "cuda:0" if torch.cuda.is_available() else "cpu" model = ParlerTTSForConditionalGeneration.from_pretrained("mesolitica/malay-parler-tts-mini-v1").to(device) tokenizer = AutoTokenizer.from_pretrained("mesolitica/malay-parler-tts-mini-v1") speakers = [ 'Yasmin', 'Osman', 'Bunga', 'Ariff', 'Ayu', 'Kamarul', 'Danial', 'Elina', ] prompt = 'Husein zolkepli sangat comel dan kacak suka makan cendol' for s in speakers: description = f"{s}'s voice, delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up." input_ids = tokenizer(description, return_tensors="pt").to(device) prompt_input_ids = tokenizer(prompt, return_tensors="pt").to(device) generation = model.generate( input_ids=input_ids.input_ids, attention_mask=input_ids.attention_mask, prompt_input_ids=prompt_input_ids.input_ids, prompt_attention_mask=prompt_input_ids.attention_mask, ) audio_arr = generation.cpu() sf.write(f'{s}.mp3', audio_arr.numpy().squeeze(), 44100) ```