Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Vikhrmodels
/
salt-qwen2.5-0.5b-tts
like
0
Follow
Vikhr models
427
Safetensors
openslr/librispeech_asr
its5Q/bigger-ru-book
mozilla-foundation/common_voice_12_0
English
Russian
Ukrainian
qwen2
Model card
Files
Files and versions
xet
Community
Model Performance Overview
Our Solution
Resources
Model Performance Overview
Metrics
:
PESQ
: Perceptual Evaluation of Speech Quality (higher = better).
STOI
: Short-Time Objective Intelligibility (closer to 1 = better).
SI-SDR
: Scale-Invariant Signal-to-Distortion Ratio (higher = better).
Model
PESQ@200
STOI@200
SI-SDR@200
Fish-aduio-1.5
1.20
0.16
23.0
SALT-tts
1.11
0.16
23.58
SALT-tts+asr
1.09
0.18
23.09
Our Solution
Method
: Extends a pre-trained LLM with audio tokens and fine-tunes on
TTS
task.
Training
:
BigCodec tokenizer (supports Slavic languages).
Training time:
100 H100 GPU hours
.
Resources
Code:
GitHub Repo
Downloads last month
2
Safetensors
Model size
501M params
Tensor type
F32
·
Chat template
Files info
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for
Vikhrmodels/salt-qwen2.5-0.5b-tts
Base model
Qwen/Qwen2.5-0.5B
Finetuned
(
353
)
this model
Datasets used to train
Vikhrmodels/salt-qwen2.5-0.5b-tts
openslr/librispeech_asr
Updated
Aug 14, 2024
•
14.4k
•
156
mozilla-foundation/common_voice_12_0
Updated
Nov 17, 2023
•
1.89k
•
36
its5Q/bigger-ru-book
Viewer
•
Updated
Mar 30
•
96.6k
•
158
•
7
Collection including
Vikhrmodels/salt-qwen2.5-0.5b-tts
SALT
Collection
3 items
•
Updated
10 days ago