Whisper-Large for Broad Accent Classification
Model Description
This model includes the implementation of broader accent classification described in Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits (https://arxiv.org/pdf/2505.14648)
The included English accents are:
['British Isles', 'North America', 'Other']
How to use this model
Download repo
git clone [email protected]:tiantiaf0627/vox-profile-release.git
Install the package
conda create -n vox_profile python=3.8
cd vox-profile-release
pip install -e .
Load the model
import torch
import torch.nn.functional as F
from src.model.accent.whisper_accent import WhisperWrapper
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
model = WhisperWrapper.from_pretrained("tiantiaf/whisper-large-v3-broad-accent").to(device)
model.eval()
Prediction
english_accent_list = [
'British Isles', 'North America', 'Other'
]
max_audio_length = 15 * 16000
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
logits, embeddings = model(data, return_feature=True)
accent_prob = F.softmax(logits, dim=1)
print(english_accent_list[torch.argmax(accent_prob).detach().cpu().item()])
If you have any questions, please contact: Tiantian Feng ([email protected])
Responsible use of the Model: the Model is released under Open RAIL license, and users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions in using our model.
❌ Out-of-Scope Use
- Clinical or diagnostic applications
- Surveillance
- Privacy-invasive applications
- No commercial use