Auto-ATT 🔊🤖

Automatically Evaluating the Human-likeness of TTS Systems via Audio-LLM-Based Score Regression

Auto-ATT is a model that LoRA finetuned on Qwen2-Audio-Instruct and offers a plug‑and‑play pipeline to grade “Audio Turing Tests” (ATTs) at scale, producing objective scores that correlate with human judgements––all without manual listening.

About Audio Turing Test (ATT)

ATT is an evaluation framework with a standardized human evaluation protocol and an accompanying dataset, aiming to resolve the lack of unified protocols in TTS evaluation and the difficulty in comparing multiple TTS systems. To further support the training and iteration of TTS systems, we utilized additional private evaluation data to train Auto-ATT model based on Qwen2-Audio-7B, enabling a model-as-a-judge approach for rapid evaluation of TTS systems on the ATT dataset. The datasets and Auto-ATT model can be cound in ATT Collection.

Usage

Inference Code

Datasets & Benchmarks

See ATT Collection.

Citation

@software{Auto-ATT,
  author = {Wang, Xihuai and Zhao, Ziyi and Ren, Siyu and Zhang, Shao and Li, Song and Li, Xiaoyu and Wang, Ziwen and Qiu, Lin and Wan, Guanglu and Cao, Xuezhi and Cai, Xunliang and Zhang, Weinan},
  title = {Audio Turing Test: Benchmarking the Human-likeness and Naturalness of Large Language Model-based Text-to-Speech Systems in Chinese},
  year = {2025},
  url = {https://huggingface.co/Meituan/Auto-ATT},
  publisher = {huggingface},
}

Downloads last month: 36

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for meituan-longcat/Auto-ATT

Base model

Qwen/Qwen2-Audio-7B-Instruct

Finetuned

(14)

this model

Collection including meituan-longcat/Auto-ATT

Audio Turning Test

Collection

3 items • Updated Nov 6 • 1