This is a pruned and re-organized version of SWivid/F5-TTS, made to be used with the fairytaler
Python library, an unofficial reimplementation of F5TTS made for fast and lightweight inference.
Installation
Fairytaler assumes you have a working CUDA environment to install into.
pip install fairytaler
This will install the reimplementation library.
How to Use
You do not need to pre-download anything, necessary data will be downloaded at runtime.
Command Line
Use the fairytaler
binary from the command line like so:
fairytaler examples/reference.wav examples/reference.txt "Fairytaler is an unofficial minimal re-implementation of F5 TTS."
Reference Audio | Generated Audio |
---|---|
Reference audio sourced from DiPCo
Many options are available, for complete documentation run fairytaler --help
.
Python
from fairytaler import F5TTSPipeline
pipeline = F5TTSPipeline.from_pretrained("benjamin-paine/fairytaler", device="auto")
output_wav_file = pipeline(
text="Hello, this is some test audio!",
reference_audio="examples/reference.wav",
reference_text="examples/reference.txt",
output_save=True
)
print(f"Output saved to {output_wav_file}")
The full execution signature is:
def __call__(
self,
text: Union[str, List[str]],
reference_audio: AudioType,
reference_text: str,
reference_sample_rate: Optional[int]=None,
seed: SeedType=None,
speed: float=1.0,
sway_sampling_coef: float=-1.0,
target_rms: float=0.1,
cross_fade_duration: float=0.15,
punctuation_pause_duration: float=0.10,
num_steps: int=32,
cfg_strength: float=2.0,
fix_duration: Optional[float]=None,
use_tqdm: bool=False,
output_format: AUDIO_OUTPUT_FORMAT_LITERAL="wav",
output_save: bool=False,
chunk_callback: Optional[Callable[[AudioResultType], None]]=None,
chunk_callback_format: AUDIO_OUTPUT_FORMAT_LITERAL="float",
) -> AudioResultType
Format values are wav
, ogg
, flac
, mp3
, float
and int
. Passing output_save=True
will save to file, not passing it will return the data directly.
Citations
@misc{chen2024f5ttsfairytalerfakesfluent,
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
year={2024},
eprint={2410.06885},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2410.06885},
}
@misc{vansegbroeck2019dipcodinnerparty,
title={DiPCo -- Dinner Party Corpus},
author={Maarten Van Segbroeck and Ahmed Zaid and Ksenia Kutsenko and Cirenia Huerta and Tinh Nguyen and Xuewen Luo and Björn Hoffmeister and Jan Trmal and Maurizio Omologo and Roland Maas},
year={2019},
eprint={1909.13447},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/1909.13447},
}
Model tree for benjamin-paine/fairytaler
Base model
SWivid/F5-TTS