(WIP) 40_000 / 100_000 steps done.

Total dataset length: ~188 hours of russian speech.
Training steps: 10 epochs of 10_000 steps
Loss: (To be updated. Current value avg) 3.794000

Available russian speakers (original dataset speaker names):

Speaker	Samples	Duration (hours)
irina_bulekova	8012	17.50
smelova_s	26371	41.65
alina_archibasova	14097	22.07
maksim_suslov	6440	20.70
daniel_che	5502	19.20
evgenii_lebedev	3811	12.50
evgenii_babincev	5614	8.90
aleksandr_zbarovskii	6212	9.39
jam_nebesky	8052	19.82
aleksandr_kotov	12706	16.63
TOTAL	96817	188.35

Original model card

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

Model Details

Model Capabilities

Human-Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models
Zero-Shot Voice Cloning: Clone voices without prior fine-tuning
Guided Emotion and Intonation: Control speech and emotion characteristics with simple tags
Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming

Model Sources

GitHub Repo: https://github.com/canopyai/Orpheus-TTS
Blog Post: https://canopylabs.ai/model-releases
Colab Inference Notebook: notebook link
One-Click Deployment on Baseten: https://www.baseten.co/library/orpheus-tts/

Usage

Check out our Colab (link to Colab) or GitHub (link to GitHub) on how to run easy inference on our finetuned models.

Model Misuse

Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.

papacliff
/

orpheus-3b-0.1-ft-ru