Chatterbox TTS French 🥖

Chatterbox TTS French is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis.

baguette-france-tour-eiffel-image

Usage Example

Here’s how to generate speech using Chatterbox-TTS French:

import torch
import soundfile as sf
from chatterbox.tts import ChatterboxTTS
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# Configuration
MODEL_REPO = "Thomcles/Chatterbox-TTS-French"
CHECKPOINT_FILENAME = "t3_cfg.safetensors"
OUTPUT_PATH = "output_cloned_voice.wav"
TEXT_TO_SYNTHESIZE = "Jean-Paul Sartre laisse à la postérité une œuvre considérable, tant littéraire que philosophique, ayant influencée à la fois la vie politique française d'après-guerre et les penseurs de son temps (Merleau-Ponty et Alain Badiou notamment)."

def get_device() -> str:
    return "cuda" if torch.cuda.is_available() else "cpu"

def download_checkpoint(repo: str, filename: str) -> str:
    return hf_hub_download(repo_id=repo, filename=filename)

def load_tts_model(repo: str, checkpoint_file: str, device: str) -> ChatterboxTTS:
    model = ChatterboxTTS.from_pretrained(device=device)
    checkpoint_path = download_checkpoint(repo, checkpoint_file)
    t3_state = load_file(checkpoint_path, device="cpu")
    model.t3.load_state_dict(t3_state)
    return model

def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor:
    with torch.inference_mode():
        return model.generate(
            text=text, 
            audio_prompt_path=audio_prompt_path, 
            **kwargs
        )

def save_audio(waveform: torch.Tensor, path: str, sample_rate: int):
    sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate)

def main():
    print("Loading model...")
    device = get_device()
    model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device)

    print(f"Generating speech on {device}...")
    wav = synthesize_speech(
        model,
        TEXT_TO_SYNTHESIZE,
        audio_prompt_path=None,
        exaggeration=0.5,
        temperature=0.6,
        cfg_weight=0.3
    )

    print(f"Saving output to: {OUTPUT_PATH}")
    save_audio(wav, OUTPUT_PATH, model.sr)
    print("Done.")

if __name__ == "__main__":
    main()

Here is the output:

Base model license

The base model is licensed under the MIT License.
Base model: Chatterbox
License: MIT

Training Data License

This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
Dataset: Emilia
License: Creative Commons Attribution 4.0 International

Contact me

Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Don’t hesitate to reach out.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Thomcles/Chatterbox-TTS-French

Finetuned
(8)
this model

Dataset used to train Thomcles/Chatterbox-TTS-French