YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Installation
conda env
conda create --name tts python=3.8.19
conda activate tts
pip version 24 required
pip install --upgrade pip==24.0
install torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
install all Python package requirements:
pip install -r requirements.txt
after if faced error in installing requirements pip install libraries provided in the TTS/cmd.txt
type each command manually
Docker Setup (If above setup fails use Docker)
Pull the Docker image:
docker pull starxa2/grad_image
docker run --gpus all -it \
-v $(pwd)/TTS:/workspace/TTS \
starxa2/grad_image
then once inside the container: docker image already has the environment setup
cd /workspace/TTS
conda activate grad
After the environment is completely setup then
Secondly, build monotonic_align
code (Cython):
cd model/monotonic_align/
mkdir model
cd model
mkdir monotonic_align
cd ../../../..
cd model/monotonic_align/model/monotonic_align; python setup.py build_ext --inplace; cd ../..
Download checkpoints and save them to locations
google drive link
Hifi-gan from drive link at TTS/checkpts/hifigan.pt
speaker encoder from drive link at TTS/spk_encoder/speaker_encoder.pt
tts checkpoints at TTS/logs/tts.pt
Inference
to use the gradio inference for generating real time inference.
python infer-gradio.py
or
- for the inference.py script that uses the text file with the single language transcript
- Ensure you are in the TTS directory when executing the inference script.
CUDA_VISIBLE_DEVICES=0 python inference.py \
--checkpoint path_to_language_tts_checkpoint.pt \
--audio path_of_mono_freq_audio_to_adapt.wav \
--file ./resources/filelists/synthesis.txt \
--log_dir ./output_wavs \
--language hindi \
--gender male
EXAPMLE
CUDA_VISIBLE_DEVICES=0 python inference.py \
--checkpoint /workspace/ayush/bert/container_data/grad_cfg/gradtts_spk_emb/Speech-Backbones/Grad-TTS/logs_mel_final_att_a1a2_4_date_16/hindi_mel_dp_22k_no_pros_w_att_seg_3/cfg_indicspb_parallel_mel_dp_w_att_seg_3/hindi/grad_2810.pt \
--audio /workspace/ayush/bert/container_data/abhi_voice.wav \
--file /workspace/ayush/bert/container_data/grad_cfg/gradtts_spk_emb/Speech-Backbones/Grad-TTS/resources/filelists/synthesis.txt \
--log_dir ./output_wavs \
--language hindi \
--gender male
here how to change the language checkpoint.
- change the --checkpoint language_TTS.pt path
- in --audio provide the mono freq audio in place of audio that is in the same language
- in --file provide path to synthesis.txt
- --log_dir provide path to store the generated audio outputs for the text for the given audio
- --language specify the language that is used
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support