(WIP) 40_000 / 100_000 steps done.

  • Total dataset length: ~188 hours of russian speech.
  • Training steps: 10 epochs of 10_000 steps
  • Loss: (To be updated. Current value avg) 3.794000

Available russian speakers (original dataset speaker names):

Speaker Samples Duration (hours)
irina_bulekova 8012 17.50
smelova_s 26371 41.65
alina_archibasova 14097 22.07
maksim_suslov 6440 20.70
daniel_che 5502 19.20
evgenii_lebedev 3811 12.50
evgenii_babincev 5614 8.90
aleksandr_zbarovskii 6212 9.39
jam_nebesky 8052 19.82
aleksandr_kotov 12706 16.63
TOTAL 96817 188.35

Original model card

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

Model Details

Model Capabilities

  • Human-Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models
  • Zero-Shot Voice Cloning: Clone voices without prior fine-tuning
  • Guided Emotion and Intonation: Control speech and emotion characteristics with simple tags
  • Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming

Model Sources

Usage

Check out our Colab (link to Colab) or GitHub (link to GitHub) on how to run easy inference on our finetuned models.

Model Misuse

Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for papacliff/orpheus-3b-0.1-ft-ru

Dataset used to train papacliff/orpheus-3b-0.1-ft-ru