Audio Classification
Transformers
Safetensors
Chinese
qwen2_audio
text2text-generation

Auto-ATT ๐Ÿ”Š๐Ÿค–

Automatically Evaluating the Human-likeness of TTS Systems via Audio-LLM-Based Score Regression

Auto-ATT is a model that LoRA finetuned on Qwen2-Audio-Instruct and offers a plugโ€‘andโ€‘play pipeline to grade โ€œAudio Turing Testsโ€ (ATTs) at scale, producing objective scores that correlate with human judgementsโ€“โ€“all without manual listening.

About Audio Turing Test (ATT)

ATT is an evaluation framework with a standardized human evaluation protocol and an accompanying dataset, aiming to resolve the lack of unified protocols in TTS evaluation and the difficulty in comparing multiple TTS systems. To further support the training and iteration of TTS systems, we utilized additional private evaluation data to train Auto-ATT model based on Qwen2-Audio-7B, enabling a model-as-a-judge approach for rapid evaluation of TTS systems on the ATT dataset. The datasets and Auto-ATT model can be cound in ATT Collection.

Usage

Inference Code

Datasets & Benchmarks

See ATT Collection.

Citation

@software{Auto-ATT,
  author = {Wang, Xihuai and Zhao, Ziyi and Ren, Siyu and Zhang, Shao and Li, Song and Li, Xiaoyu and Wang, Ziwen and Qiu, Lin and Wan, Guanglu and Cao, Xuezhi and Cai, Xunliang and Zhang, Weinan},
  title = {Audio Turing Test: Benchmarking the Human-likeness and Naturalness of Large Language Model-based Text-to-Speech Systems in Chinese},
  year = {2025},
  url = {https://huggingface.co/Meituan/Auto-ATT},
  publisher = {huggingface},
}
Downloads last month
6
Safetensors
Model size
8.4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for meituan/Auto-ATT

Finetuned
(6)
this model

Collection including meituan/Auto-ATT