Vikhrmodels
/

salt-qwen2.5-0.5b-tts

Model card Files Files and versions

Model Performance Overview

Metrics:

PESQ: Perceptual Evaluation of Speech Quality (higher = better).
STOI: Short-Time Objective Intelligibility (closer to 1 = better).
SI-SDR: Scale-Invariant Signal-to-Distortion Ratio (higher = better).

Model	PESQ@200	STOI@200	SI-SDR@200
Fish-aduio-1.5	1.20	0.16	23.0
SALT-tts	1.11	0.16	23.58
SALT-tts+asr	1.09	0.18	23.09

Our Solution

Method: Extends a pre-trained LLM with audio tokens and fine-tunes on TTS task.
Training:
- BigCodec tokenizer (supports Slavic languages).
- Training time: 100 H100 GPU hours.

Resources

Code: GitHub Repo

Downloads last month: 2

Safetensors

Model size

501M params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vikhrmodels/salt-qwen2.5-0.5b-tts

Base model

Qwen/Qwen2.5-0.5B

Finetuned

(353)

this model

Datasets used to train Vikhrmodels/salt-qwen2.5-0.5b-tts

Collection including Vikhrmodels/salt-qwen2.5-0.5b-tts

SALT

3 items • Updated 10 days ago