bharatgenai/TTS-Hindi-0.5

Installation

conda env

conda create --name tts python=3.8.19
conda activate tts

pip version 24 required

pip install --upgrade pip==24.0

install torch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

install all Python package requirements:

pip install -r requirements.txt

after if faced error in installing requirements pip install libraries provided in the TTS/cmd.txt type each command manually

Docker Setup (If above setup fails use Docker)

Pull the Docker image:

docker pull starxa2/grad_image

docker run --gpus all -it \
  -v $(pwd)/TTS:/workspace/TTS \
  starxa2/grad_image

then once inside the container: docker image already has the environment setup

cd /workspace/TTS
conda activate grad

After the environment is completely setup then Secondly, build monotonic_align code (Cython):

cd model/monotonic_align/ 
mkdir model
cd model
mkdir monotonic_align 
cd ../../../..

cd model/monotonic_align/model/monotonic_align; python setup.py build_ext --inplace; cd ../..

Download checkpoints and save them to locations

google drive link

Hifi-gan from drive link at TTS/checkpts/hifigan.pt

speaker encoder from drive link at TTS/spk_encoder/speaker_encoder.pt

tts checkpoints at TTS/logs/tts.pt

Inference

to use the gradio inference for generating real time inference.

python infer-gradio.py

for the inference.py script that uses the text file with the single language transcript
Ensure you are in the TTS directory when executing the inference script.

CUDA_VISIBLE_DEVICES=0 python inference.py \
  --checkpoint path_to_language_tts_checkpoint.pt \
  --audio path_of_mono_freq_audio_to_adapt.wav \
  --file ./resources/filelists/synthesis.txt \
  --log_dir ./output_wavs \
  --language hindi \
  --gender male

EXAPMLE

CUDA_VISIBLE_DEVICES=0 python inference.py \
  --checkpoint /workspace/ayush/bert/container_data/grad_cfg/gradtts_spk_emb/Speech-Backbones/Grad-TTS/logs_mel_final_att_a1a2_4_date_16/hindi_mel_dp_22k_no_pros_w_att_seg_3/cfg_indicspb_parallel_mel_dp_w_att_seg_3/hindi/grad_2810.pt \
  --audio /workspace/ayush/bert/container_data/abhi_voice.wav \
  --file /workspace/ayush/bert/container_data/grad_cfg/gradtts_spk_emb/Speech-Backbones/Grad-TTS/resources/filelists/synthesis.txt \
  --log_dir ./output_wavs \
  --language hindi \
  --gender male

here how to change the language checkpoint.

change the --checkpoint language_TTS.pt path
in --audio provide the mono freq audio in place of audio that is in the same language
in --file provide path to synthesis.txt
--log_dir provide path to store the generated audio outputs for the text for the given audio
--language specify the language that is used