Auto-ATT ๐๐ค
Automatically Evaluating the Human-likeness of TTS Systems via Audio-LLM-Based Score Regression
Auto-ATT is a model that LoRA finetuned on Qwen2-Audio-Instruct and offers a plugโandโplay pipeline to grade โAudio Turing Testsโ (ATTs) at scale, producing objective scores that correlate with human judgementsโโall without manual listening.
About Audio Turing Test (ATT)
ATT is an evaluation framework with a standardized human evaluation protocol and an accompanying dataset, aiming to resolve the lack of unified protocols in TTS evaluation and the difficulty in comparing multiple TTS systems. To further support the training and iteration of TTS systems, we utilized additional private evaluation data to train Auto-ATT model based on Qwen2-Audio-7B, enabling a model-as-a-judge approach for rapid evaluation of TTS systems on the ATT dataset. The datasets and Auto-ATT model can be cound in ATT Collection.
Usage
Datasets & Benchmarks
See ATT Collection.
Citation
@software{Auto-ATT,
author = {Wang, Xihuai and Zhao, Ziyi and Ren, Siyu and Zhang, Shao and Li, Song and Li, Xiaoyu and Wang, Ziwen and Qiu, Lin and Wan, Guanglu and Cao, Xuezhi and Cai, Xunliang and Zhang, Weinan},
title = {Audio Turing Test: Benchmarking the Human-likeness and Naturalness of Large Language Model-based Text-to-Speech Systems in Chinese},
year = {2025},
url = {https://huggingface.co/Meituan/Auto-ATT},
publisher = {huggingface},
}
- Downloads last month
- 6
Model tree for meituan/Auto-ATT
Base model
Qwen/Qwen2-Audio-7B-Instruct