Spaces:
Running
Running
fix and make functional, add more datasets
Browse files- CONTRIBUTING.md +1 -1
- DEVELOPMENT.md +45 -26
- README.md +9 -14
- app/app.py +76 -48
- app/data.py +0 -180
- app/data/test/cache-38f74914f01da443.arrow +3 -0
- app/data/test/cache-43bad43a3f17100a.arrow +3 -0
- app/data/test/cache-7fc832a0865b46e3.arrow +3 -0
- app/data/test/cache-8e3b20205f12c8bf.arrow +3 -0
- app/data/test/cache-9a41aaef1a199c0a.arrow +3 -0
- app/data/test/cache-9a81afba5c72d77e.arrow +3 -0
- app/data/test/cache-bf2efb6be770547b.arrow +3 -0
- app/data/test/cache-ceccabba78df3ad3.arrow +3 -0
- app/data/test/cache-d8c639c50adcd3ec.arrow +3 -0
- app/data/test/cache-f9690e73716e8fdd.arrow +3 -0
- app/data/test/data-00000-of-00001.arrow +3 -0
- app/data/test/dataset_info.json +19 -0
- app/data/test/state.json +17 -0
- app/hf.py +111 -0
- app/inference.py +36 -148
- app/metrics.py +33 -0
- app/phone_metrics.py +0 -108
- app/queue/leaderboard.json +0 -192
- app/queue/results.json +0 -1014
- app/queue/tasks.json +0 -237
- app/tasks.py +84 -191
- requirements.txt +13 -7
- requirements_lock.txt +98 -26
- scripts/download_data_curl.sh +0 -3
- scripts/download_data_lfs.sh +0 -2
- scripts/install.sh +0 -19
- scripts/run-dev.sh +0 -6
- scripts/run-prod.sh +0 -6
- scripts/sample_test_set.py +33 -0
CONTRIBUTING.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
# Contributing to Koel Labs - IPA Transcription EN
|
2 |
👍🎉 First off, thanks for taking the time to contribute! 🎉👍
|
3 |
|
4 |
-
These are the specific contributing guidelines for the English IPA transcription leaderboard.
|
5 |
|
6 |
## Where to Start
|
7 |
|
|
|
1 |
# Contributing to Koel Labs - IPA Transcription EN
|
2 |
👍🎉 First off, thanks for taking the time to contribute! 🎉👍
|
3 |
|
4 |
+
These are the specific contributing guidelines for the English IPA transcription leaderboard. Check out our [general contributing guidelines here](https://github.com/KoelLabs/.github/blob/main/CONTRIBUTING.md).
|
5 |
|
6 |
## Where to Start
|
7 |
|
DEVELOPMENT.md
CHANGED
@@ -2,47 +2,69 @@
|
|
2 |
|
3 |
## Design Decisions
|
4 |
|
5 |
-
We specifically opt for a single-space leaderboard for simplicity. We solve the issue of keeping the gradio UI interactive while models are evaluating by using
|
6 |
|
7 |
-
## Setup
|
8 |
|
9 |
### Prerequisites
|
10 |
|
11 |
-
* Python 3.10
|
12 |
-
* Git
|
13 |
* A love for speech recognition! 🎤
|
14 |
|
15 |
### Quick Installation
|
16 |
|
17 |
-
|
18 |
```bash
|
19 |
-
|
20 |
-
cd IPA-Transcription-EN
|
21 |
```
|
22 |
|
23 |
-
|
24 |
```bash
|
25 |
-
.
|
26 |
```
|
27 |
|
28 |
-
|
29 |
```bash
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
```
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
4. Visit `http://localhost:7860` in your browser and see the magic! ✨
|
34 |
|
35 |
-
|
|
|
|
|
|
|
|
|
36 |
0. Activate the virtual environment with `. ./venv/bin/activate`
|
37 |
1. Add the dependency to `requirements.txt` (or remove it)
|
38 |
-
2. Make sure you have no unused dependencies with `pipx run deptry .`
|
39 |
3. Run `pip install -r requirements.txt`
|
40 |
4. Freeze the dependencies with `pip freeze > requirements_lock.txt`
|
41 |
|
42 |
-
##
|
43 |
-
|
44 |
-
.
|
45 |
-
|
|
|
|
|
46 |
|
47 |
## File Structure
|
48 |
|
@@ -56,19 +78,16 @@ IPA-Transcription-EN/
|
|
56 |
├── requirements.txt # Python dependencies
|
57 |
├── requirements_lock.txt # Locked dependencies
|
58 |
├── scripts # Helper scripts
|
59 |
-
│ ├──
|
|
|
60 |
│ └── run-dev.sh # Run the leaderboard in development mode
|
61 |
├── venv # Virtual environment
|
62 |
├── app/ # All application code lives here
|
63 |
-
│ ├── data/ # Phoneme transcription
|
64 |
-
│ ├── queue/ # Stores leaderboard state and task status
|
65 |
-
│ | ├── tasks.json # Task queue
|
66 |
-
│ | ├── results.json # Detailed evaluation results
|
67 |
-
│ | └── leaderboard.json # Compact results for leaderboard display
|
68 |
│ ├── app.py # Main Gradio UI
|
69 |
-
│ ├──
|
70 |
-
│ ├── data.py # Data loading and processing
|
71 |
│ ├── inference.py # Model inference
|
72 |
-
│ └──
|
|
|
73 |
└── img/ # Images for README and other documentation
|
74 |
```
|
|
|
2 |
|
3 |
## Design Decisions
|
4 |
|
5 |
+
We specifically opt for a single-space leaderboard for simplicity. We solve the issue of keeping the gradio UI interactive while models are evaluating by using multiprocessing instead of a separate space. Leaderboard entries are persisted in a Huggingface Dataset to avoid paying for persistent storage. Tasks are deliberately ephemeral.
|
6 |
|
7 |
+
## Local Setup
|
8 |
|
9 |
### Prerequisites
|
10 |
|
11 |
+
* [Python 3.10](https://www.python.org/downloads/release/python-31017/)
|
12 |
+
* [Git](https://git-scm.com/downloads)
|
13 |
* A love for speech recognition! 🎤
|
14 |
|
15 |
### Quick Installation
|
16 |
|
17 |
+
0. Make sure git-lfs is installed (https://git-lfs.com)
|
18 |
```bash
|
19 |
+
git lfs install
|
|
|
20 |
```
|
21 |
|
22 |
+
1. Clone this repository:
|
23 |
```bash
|
24 |
+
git clone https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN
|
25 |
```
|
26 |
|
27 |
+
2. Setup your environment:
|
28 |
```bash
|
29 |
+
# Create a virtual environment with Python 3.10
|
30 |
+
python3.10 -m venv venv
|
31 |
+
|
32 |
+
# Activate the virtual environment
|
33 |
+
. ./venv/bin/activate
|
34 |
+
# use `deactivate` to exit out of it
|
35 |
+
|
36 |
+
# Install the required dependencies
|
37 |
+
pip install -r requirements_lock.txt
|
38 |
+
|
39 |
+
# Add a HF_TOKEN with access to your backing dataset (in app/hf.py) and any models you want to be able to run
|
40 |
+
huggingface-cli login
|
41 |
```
|
42 |
|
43 |
+
3. Launch the leaderboard:
|
44 |
+
```bash
|
45 |
+
. ./scripts/run-dev.sh # development mode (auto-reloads)
|
46 |
+
. ./scripts/run-prod.sh # production mode (no auto-reloads)
|
47 |
+
```
|
48 |
+
|
49 |
4. Visit `http://localhost:7860` in your browser and see the magic! ✨
|
50 |
|
51 |
+
### Adding New Datasets
|
52 |
+
|
53 |
+
The datasets are pre-processed into a single dataset stored in `app/data/test` with three columns: audio (16 kHz), ipa, and dataset (original source). This is done using the `scripts/sample_test_set.py` file. To add new datasets, add them to this script. Beware that existing leaderboard entries will need to be recalculated. You can do this locally by accessing the dataset corresponding to `LEADERBOARD_ID` stored in `app/hf.py`.
|
54 |
+
|
55 |
+
### Adding/Removing Dependencies
|
56 |
0. Activate the virtual environment with `. ./venv/bin/activate`
|
57 |
1. Add the dependency to `requirements.txt` (or remove it)
|
58 |
+
2. Make sure you have no unused dependencies with `pipx run deptry .` (if necessary `python -m pip install pipx`)
|
59 |
3. Run `pip install -r requirements.txt`
|
60 |
4. Freeze the dependencies with `pip freeze > requirements_lock.txt`
|
61 |
|
62 |
+
## Forking Into Your Own Leaderboard
|
63 |
+
|
64 |
+
0. Navigate to [the space](https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN), click the three dots on the right and select `Duplicate this Space`
|
65 |
+
1. Modify the `LEADERBOARD_ID` in `app/hf.py` to be some dataset that you own that the new space can use to store data. You don't need to create the dataset but if you do, it should be empty.
|
66 |
+
2. Open the settings in your new space and add a new secret `HF_TOKEN`. You can [create it here](https://huggingface.co/settings/tokens). It just needs read access to all models you want to add to the leaderboard and write access to the private backing dataset specified by `LEADERBOARD_ID`.
|
67 |
+
3. Submit some models and enjoy!
|
68 |
|
69 |
## File Structure
|
70 |
|
|
|
78 |
├── requirements.txt # Python dependencies
|
79 |
├── requirements_lock.txt # Locked dependencies
|
80 |
├── scripts # Helper scripts
|
81 |
+
│ ├── sample_test_set.py # Compute the combined test set
|
82 |
+
│ ├── run-prod.sh # Run the leaderboard in production mode
|
83 |
│ └── run-dev.sh # Run the leaderboard in development mode
|
84 |
├── venv # Virtual environment
|
85 |
├── app/ # All application code lives here
|
86 |
+
│ ├── data/ # Phoneme transcription test set
|
|
|
|
|
|
|
|
|
87 |
│ ├── app.py # Main Gradio UI
|
88 |
+
│ ├── hf.py # Interface with the Huggingface API
|
|
|
89 |
│ ├── inference.py # Model inference
|
90 |
+
│ └── metrics.py # Evaluation metrics
|
91 |
+
│ ├── tasks.py # Background tasks for model evaluation
|
92 |
└── img/ # Images for README and other documentation
|
93 |
```
|
README.md
CHANGED
@@ -13,6 +13,8 @@ thumbnail: >-
|
|
13 |
short_description: Speech-to-phoneme leaderboard
|
14 |
---
|
15 |
|
|
|
|
|
16 |
# 🎯 English Phonemic Transcription Leaderboard
|
17 |
|
18 |
Welcome to the English Phonemic Transcription Leaderboard! This simple leaderboard helps track and compare the performance of different speech-to-phoneme models. Feel free to fork it for your own hugging face leaderboards!
|
@@ -30,13 +32,12 @@ Welcome to the English Phonemic Transcription Leaderboard! This simple leaderboa
|
|
30 |
|
31 |
This leaderboard tracks two key metrics for phonemic transcription models:
|
32 |
|
33 |
-
|
34 |
* **PER (Phoneme Error Rate)**: How accurately your model converts speech to phonemes
|
35 |
-
* **
|
36 |
|
37 |
Read more about evaluations on our [blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
|
38 |
|
39 |
-
Models are evaluated on
|
40 |
|
41 |
## 🚀 Getting Started
|
42 |
|
@@ -48,7 +49,7 @@ Navigate to the hosted version on [Hugging Face](https://huggingface.co/spaces/K
|
|
48 |
|
49 |
1. Go to the "Submit Model" tab
|
50 |
2. Enter your model details:
|
51 |
-
* Model
|
52 |
* Submission name (e.g., "MyAwesomeModel v1.0")
|
53 |
* GitHub/Kaggle/HuggingFace URL (optional)
|
54 |
3. Click Submit and watch your model climb the ranks! 🚀
|
@@ -56,7 +57,7 @@ Navigate to the hosted version on [Hugging Face](https://huggingface.co/spaces/K
|
|
56 |
### Checking Model Status
|
57 |
|
58 |
1. Navigate to the "Model Status" tab
|
59 |
-
2. Enter your model
|
60 |
3. Get real-time updates on your model's evaluation progress
|
61 |
|
62 |
## 📊 Understanding the Results
|
@@ -64,7 +65,7 @@ Navigate to the hosted version on [Hugging Face](https://huggingface.co/spaces/K
|
|
64 |
The leaderboard shows:
|
65 |
|
66 |
* Model names and submission details
|
67 |
-
* PER and
|
68 |
* Links to model repositories
|
69 |
* Submission dates
|
70 |
|
@@ -86,7 +87,7 @@ Want to make this leaderboard even better? We'd love your help! Here are some wa
|
|
86 |
* Submit bug fixes
|
87 |
* Add new features
|
88 |
|
89 |
-
|
90 |
|
91 |
## 📝 License
|
92 |
|
@@ -94,12 +95,6 @@ This project is licensed under the GNU Affero General Public License.
|
|
94 |
|
95 |
We retain all rights to the Koel Labs brand, logos, blog posts and website content.
|
96 |
|
97 |
-
## 🌟 Acknowledgments
|
98 |
-
|
99 |
-
* Thanks to the TIMIT speech corpus for providing evaluation data
|
100 |
-
* Shoutout to the [panphon library](https://github.com/dmort27/panphon) for PWED calculations
|
101 |
-
* Built with love by Koel Labs 💙
|
102 |
-
|
103 |
## 🆘 Need Help?
|
104 |
|
105 |
Got questions? Found a bug? Want to contribute? [Open an issue](https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN/discussions) or [reach out to us](mailto:[email protected])! We're here to help make speech recognition evaluation fun and accessible for everyone!
|
@@ -108,4 +103,4 @@ Remember: Every great model deserves its moment to shine! 🌟
|
|
108 |
|
109 |
---
|
110 |
|
111 |
-
Happy Transcribing! 🎤✨
|
|
|
13 |
short_description: Speech-to-phoneme leaderboard
|
14 |
---
|
15 |
|
16 |
+

|
17 |
+
|
18 |
# 🎯 English Phonemic Transcription Leaderboard
|
19 |
|
20 |
Welcome to the English Phonemic Transcription Leaderboard! This simple leaderboard helps track and compare the performance of different speech-to-phoneme models. Feel free to fork it for your own hugging face leaderboards!
|
|
|
32 |
|
33 |
This leaderboard tracks two key metrics for phonemic transcription models:
|
34 |
|
|
|
35 |
* **PER (Phoneme Error Rate)**: How accurately your model converts speech to phonemes
|
36 |
+
* **FER (Feature Error Rate)**: A more nuanced metric that considers phonemic features
|
37 |
|
38 |
Read more about evaluations on our [blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
|
39 |
|
40 |
+
Models are evaluated on a variety of English speech: native, non-native, and impaired.
|
41 |
|
42 |
## 🚀 Getting Started
|
43 |
|
|
|
49 |
|
50 |
1. Go to the "Submit Model" tab
|
51 |
2. Enter your model details:
|
52 |
+
* Model ID (e.g., "my-name/wav2vec2-phoneme-wizard")
|
53 |
* Submission name (e.g., "MyAwesomeModel v1.0")
|
54 |
* GitHub/Kaggle/HuggingFace URL (optional)
|
55 |
3. Click Submit and watch your model climb the ranks! 🚀
|
|
|
57 |
### Checking Model Status
|
58 |
|
59 |
1. Navigate to the "Model Status" tab
|
60 |
+
2. Enter your model ID or task ID
|
61 |
3. Get real-time updates on your model's evaluation progress
|
62 |
|
63 |
## 📊 Understanding the Results
|
|
|
65 |
The leaderboard shows:
|
66 |
|
67 |
* Model names and submission details
|
68 |
+
* PER and FER scores (lower is better!)
|
69 |
* Links to model repositories
|
70 |
* Submission dates
|
71 |
|
|
|
87 |
* Submit bug fixes
|
88 |
* Add new features
|
89 |
|
90 |
+
Check out the [CONTRIBUTING.md](CONTRIBUTING.md) for more details.
|
91 |
|
92 |
## 📝 License
|
93 |
|
|
|
95 |
|
96 |
We retain all rights to the Koel Labs brand, logos, blog posts and website content.
|
97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
## 🆘 Need Help?
|
99 |
|
100 |
Got questions? Found a bug? Want to contribute? [Open an issue](https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN/discussions) or [reach out to us](mailto:[email protected])! We're here to help make speech recognition evaluation fun and accessible for everyone!
|
|
|
103 |
|
104 |
---
|
105 |
|
106 |
+
Happy Transcribing! 🎤✨
|
app/app.py
CHANGED
@@ -1,50 +1,66 @@
|
|
1 |
# This is the main module that handles rendering the Gradio interface.
|
2 |
-
|
3 |
-
# Note: gradio will automatically create REST API endpoints for the functions that are used as event handlers in the interface.
|
4 |
|
5 |
import gradio as gr
|
6 |
import pandas as pd
|
7 |
|
8 |
-
from tasks import start_eval_task,
|
|
|
9 |
|
10 |
|
11 |
-
def get_latest_leaderboard_html(sort_option: str) -> str:
|
12 |
try:
|
13 |
# Get the latest leaderboard data
|
14 |
-
df =
|
|
|
|
|
|
|
|
|
15 |
|
16 |
-
# Sort the dataframe so smallest PER or
|
17 |
-
sort_column = "average_per" if sort_option.lower() == "per" else "
|
18 |
df = df.sort_values(by=sort_column, ascending=True)
|
19 |
|
20 |
# Format the dataframe for HTML display
|
21 |
df = pd.DataFrame(
|
22 |
{
|
23 |
-
"Model": df
|
24 |
-
|
25 |
-
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
lambda x: (
|
28 |
f'<a href="{x}" target="_blank">Repository</a>' if x else "N/A"
|
29 |
)
|
30 |
),
|
31 |
-
"Submission Date": pd.to_datetime(
|
32 |
-
"
|
33 |
-
),
|
34 |
}
|
35 |
)
|
36 |
return df.to_html(escape=False, index=False, classes="styled-table")
|
37 |
except Exception as e:
|
38 |
-
|
39 |
-
return "Error updating leaderboard"
|
40 |
|
41 |
|
42 |
-
def submit_evaluation(
|
43 |
-
|
|
|
|
|
44 |
return "⚠️ Please provide both model name and submission name."
|
45 |
|
46 |
try:
|
47 |
-
task_id = start_eval_task(
|
48 |
return f"✅ Evaluation submitted successfully! Task ID: {task_id}"
|
49 |
except Exception as e:
|
50 |
return f"❌ Error: {str(e)}"
|
@@ -58,7 +74,6 @@ with gr.Blocks(
|
|
58 |
margin: 25px 0;
|
59 |
font-size: 0.9em;
|
60 |
font-family: sans-serif;
|
61 |
-
box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
|
62 |
}
|
63 |
.styled-table thead tr {
|
64 |
background: linear-gradient(45deg, #092746, #073562, #0A648F);
|
@@ -75,22 +90,18 @@ with gr.Blocks(
|
|
75 |
}
|
76 |
"""
|
77 |
) as demo:
|
78 |
-
gr.Markdown("# 🎯 English
|
79 |
gr.Markdown("#### Developed By: [Koel Labs](https://koellabs.com)")
|
80 |
gr.Markdown(
|
81 |
"""
|
82 |
-
##
|
|
|
|
|
83 |
- **PER (Phoneme Error Rate)**: The Levenshtein distance calculated between phoneme sequences of the predicted and actual transcriptions.
|
84 |
-
- **
|
85 |
|
86 |
-
Read more about evaluations on [our blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
|
87 |
-
|
88 |
-
)
|
89 |
-
gr.Markdown(
|
90 |
-
"""
|
91 |
-
## Test Set Information
|
92 |
-
The test set used for evaluation is from the [TIMIT speech corpus](https://www.kaggle.com/datasets/mfekadu/darpa-timit-acousticphonetic-continuous-speech). The TIMIT corpus is a widely used dataset for speech recognition research.
|
93 |
-
|
94 |
## Compute
|
95 |
This leaderboard uses the free basic plan (16GB RAM, 2vCPUs) to allow for reproducability. The evaluation may take several hours to complete. Please be patient and do not submit the same model multiple times.
|
96 |
|
@@ -100,38 +111,55 @@ with gr.Blocks(
|
|
100 |
)
|
101 |
with gr.Tabs():
|
102 |
with gr.TabItem("🏆 Leaderboard"):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
with gr.Row(elem_classes="controls-row"):
|
104 |
-
# Controls side by side
|
105 |
sort_dropdown = gr.Dropdown(
|
106 |
-
choices=["
|
107 |
-
value="
|
108 |
interactive=True,
|
109 |
scale=2,
|
110 |
container=False, # Removes the box around the dropdown
|
111 |
-
label=None, # Removes the "Sort by" label
|
112 |
)
|
113 |
-
refresh_btn = gr.Button("Refresh 🔄", scale=2)
|
114 |
|
115 |
-
leaderboard_html = gr.HTML(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
sort_dropdown.change(
|
117 |
fn=get_latest_leaderboard_html,
|
118 |
-
inputs=[sort_dropdown],
|
119 |
outputs=leaderboard_html,
|
120 |
)
|
121 |
refresh_btn.click(
|
122 |
fn=get_latest_leaderboard_html,
|
123 |
-
inputs=[sort_dropdown],
|
124 |
outputs=leaderboard_html,
|
125 |
)
|
126 |
|
127 |
with gr.TabItem("📝 Submit Model"):
|
128 |
-
|
129 |
-
label="Model
|
130 |
)
|
131 |
-
|
132 |
-
label="Submission Name", placeholder="
|
133 |
)
|
134 |
-
|
135 |
label="Github/Kaggle/HF URL (optional)",
|
136 |
placeholder="https://github.com/username/repo",
|
137 |
)
|
@@ -140,14 +168,14 @@ with gr.Blocks(
|
|
140 |
|
141 |
submit_btn.click(
|
142 |
fn=submit_evaluation,
|
143 |
-
inputs=[
|
144 |
outputs=result,
|
145 |
)
|
146 |
|
147 |
-
with gr.TabItem("📊
|
148 |
query = gr.Textbox(
|
149 |
-
label="Model
|
150 |
-
placeholder="Enter model
|
151 |
)
|
152 |
status_btn = gr.Button("Check Status")
|
153 |
status_output = gr.JSON(label="Status")
|
|
|
1 |
# This is the main module that handles rendering the Gradio interface.
|
2 |
+
# NOTE: gradio will automatically create REST API endpoints for the functions that are used as event handlers in the interface.
|
|
|
3 |
|
4 |
import gradio as gr
|
5 |
import pandas as pd
|
6 |
|
7 |
+
from tasks import start_eval_task, get_status
|
8 |
+
from hf import get_or_create_leaderboard
|
9 |
|
10 |
|
11 |
+
def get_latest_leaderboard_html(datasets: list[str], sort_option: str) -> str:
|
12 |
try:
|
13 |
# Get the latest leaderboard data
|
14 |
+
df: pd.DataFrame = get_or_create_leaderboard().sort("submission_timestamp", reverse=True).to_pandas() # type: ignore
|
15 |
+
df = df.drop_duplicates("repo_id", keep="first")
|
16 |
+
|
17 |
+
if len(df) == 0:
|
18 |
+
return "No scores, please submit models for evaluation."
|
19 |
|
20 |
+
# Sort the dataframe so smallest PER or FER is at the top
|
21 |
+
sort_column = "average_per" if sort_option.lower() == "per" else "average_fer"
|
22 |
df = df.sort_values(by=sort_column, ascending=True)
|
23 |
|
24 |
# Format the dataframe for HTML display
|
25 |
df = pd.DataFrame(
|
26 |
{
|
27 |
+
"Model": df.apply(
|
28 |
+
lambda r: f'<a href="https://huggingface.co/{r["repo_id"]}" target="_blank">{r["display_name"]}</a>',
|
29 |
+
axis=1,
|
30 |
+
),
|
31 |
+
"Average PER ⬇️": df["average_per"].apply(lambda x: f"{100 * x:.2f}%"),
|
32 |
+
}
|
33 |
+
| {
|
34 |
+
f"{d} FER ⬇️": df["average_fer" if d == "Average" else f"fer_{d}"].apply(
|
35 |
+
lambda x: f"{100 * x:.2f}%"
|
36 |
+
)
|
37 |
+
for d in datasets
|
38 |
+
}
|
39 |
+
| {
|
40 |
+
"Link": df["url"].apply(
|
41 |
lambda x: (
|
42 |
f'<a href="{x}" target="_blank">Repository</a>' if x else "N/A"
|
43 |
)
|
44 |
),
|
45 |
+
"Submission Date": pd.to_datetime(
|
46 |
+
df["submission_timestamp"]
|
47 |
+
).dt.strftime("%Y-%m-%d"),
|
48 |
}
|
49 |
)
|
50 |
return df.to_html(escape=False, index=False, classes="styled-table")
|
51 |
except Exception as e:
|
52 |
+
|
53 |
+
return f"Error updating leaderboard: {type(e).__name__} - {e}"
|
54 |
|
55 |
|
56 |
+
def submit_evaluation(model_id: str, display_name: str, url: str) -> str:
|
57 |
+
model_id = model_id.strip()
|
58 |
+
display_name = display_name.strip()
|
59 |
+
if not model_id or not display_name:
|
60 |
return "⚠️ Please provide both model name and submission name."
|
61 |
|
62 |
try:
|
63 |
+
task_id = start_eval_task(display_name, model_id, url)
|
64 |
return f"✅ Evaluation submitted successfully! Task ID: {task_id}"
|
65 |
except Exception as e:
|
66 |
return f"❌ Error: {str(e)}"
|
|
|
74 |
margin: 25px 0;
|
75 |
font-size: 0.9em;
|
76 |
font-family: sans-serif;
|
|
|
77 |
}
|
78 |
.styled-table thead tr {
|
79 |
background: linear-gradient(45deg, #092746, #073562, #0A648F);
|
|
|
90 |
}
|
91 |
"""
|
92 |
) as demo:
|
93 |
+
gr.Markdown("# 🎯 English Speech2IPA Leaderboard")
|
94 |
gr.Markdown("#### Developed By: [Koel Labs](https://koellabs.com)")
|
95 |
gr.Markdown(
|
96 |
"""
|
97 |
+
## Evaluation
|
98 |
+
We use two standard metrics:
|
99 |
+
|
100 |
- **PER (Phoneme Error Rate)**: The Levenshtein distance calculated between phoneme sequences of the predicted and actual transcriptions.
|
101 |
+
- **FER (Feature Error Rate)**: The edit distance between the predicted and actual phoneme sequences, weighted by the phonetic features from [panphon](https://github.com/dmort27/panphon).
|
102 |
|
103 |
+
Models are evaluated on a variety of English speech: native, non-native, and impaired. Read more about evaluations on [our blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
|
104 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
## Compute
|
106 |
This leaderboard uses the free basic plan (16GB RAM, 2vCPUs) to allow for reproducability. The evaluation may take several hours to complete. Please be patient and do not submit the same model multiple times.
|
107 |
|
|
|
111 |
)
|
112 |
with gr.Tabs():
|
113 |
with gr.TabItem("🏆 Leaderboard"):
|
114 |
+
dataset_dropdown = gr.Dropdown(
|
115 |
+
choices=["Average", "TIMIT", "EpaDB", "PSST", "SpeechOcean", "ISLE"],
|
116 |
+
value=["Average"],
|
117 |
+
multiselect=True,
|
118 |
+
interactive=True,
|
119 |
+
scale=2,
|
120 |
+
container=False, # Removes the box around the dropdown
|
121 |
+
)
|
122 |
with gr.Row(elem_classes="controls-row"):
|
|
|
123 |
sort_dropdown = gr.Dropdown(
|
124 |
+
choices=["FER", "PER"],
|
125 |
+
value="FER",
|
126 |
interactive=True,
|
127 |
scale=2,
|
128 |
container=False, # Removes the box around the dropdown
|
|
|
129 |
)
|
130 |
+
refresh_btn = gr.Button("Refresh 🔄", scale=2)
|
131 |
|
132 |
+
leaderboard_html = gr.HTML("Loading Leaderboard...")
|
133 |
+
demo.load(
|
134 |
+
fn=get_latest_leaderboard_html,
|
135 |
+
inputs=[dataset_dropdown, sort_dropdown],
|
136 |
+
outputs=leaderboard_html,
|
137 |
+
show_progress="minimal",
|
138 |
+
)
|
139 |
+
dataset_dropdown.change(
|
140 |
+
fn=get_latest_leaderboard_html,
|
141 |
+
inputs=[dataset_dropdown, sort_dropdown],
|
142 |
+
outputs=leaderboard_html,
|
143 |
+
)
|
144 |
sort_dropdown.change(
|
145 |
fn=get_latest_leaderboard_html,
|
146 |
+
inputs=[dataset_dropdown, sort_dropdown],
|
147 |
outputs=leaderboard_html,
|
148 |
)
|
149 |
refresh_btn.click(
|
150 |
fn=get_latest_leaderboard_html,
|
151 |
+
inputs=[dataset_dropdown, sort_dropdown],
|
152 |
outputs=leaderboard_html,
|
153 |
)
|
154 |
|
155 |
with gr.TabItem("📝 Submit Model"):
|
156 |
+
model_id = gr.Textbox(
|
157 |
+
label="Model ID", placeholder="facebook/wav2vec2-lv-60-espeak-cv-ft"
|
158 |
)
|
159 |
+
display_name = gr.Textbox(
|
160 |
+
label="Submission Name", placeholder="Facebook Wav2Vec2 Espeak 60"
|
161 |
)
|
162 |
+
url = gr.Textbox(
|
163 |
label="Github/Kaggle/HF URL (optional)",
|
164 |
placeholder="https://github.com/username/repo",
|
165 |
)
|
|
|
168 |
|
169 |
submit_btn.click(
|
170 |
fn=submit_evaluation,
|
171 |
+
inputs=[model_id, display_name, url],
|
172 |
outputs=result,
|
173 |
)
|
174 |
|
175 |
+
with gr.TabItem("📊 Submission Status"):
|
176 |
query = gr.Textbox(
|
177 |
+
label="Model ID or Task ID",
|
178 |
+
placeholder="Enter model ID (e.g., facebook/wav2vec2-lv-60-espeak-cv-ft)",
|
179 |
)
|
180 |
status_btn = gr.Button("Check Status")
|
181 |
status_output = gr.JSON(label="Status")
|
app/data.py
DELETED
@@ -1,180 +0,0 @@
|
|
1 |
-
# This module handles the data loading and preprocessing for various phoneme transcription datasets.
|
2 |
-
|
3 |
-
import torch
|
4 |
-
import torchaudio
|
5 |
-
|
6 |
-
import zipfile
|
7 |
-
from pathlib import Path
|
8 |
-
|
9 |
-
# Get absolute path
|
10 |
-
CURRENT_DIR = Path(__file__).parent.absolute()
|
11 |
-
|
12 |
-
# Constants
|
13 |
-
DATA_DIR = CURRENT_DIR / "data"
|
14 |
-
TIMIT_PATH = DATA_DIR / "TIMIT.zip"
|
15 |
-
|
16 |
-
|
17 |
-
# Abstract data manager class
|
18 |
-
class DataManager:
|
19 |
-
"""Abstract class for handling dataset operations"""
|
20 |
-
|
21 |
-
def get_file_list(self, subset: str) -> list[str]:
|
22 |
-
"""Get list of files for given subset"""
|
23 |
-
raise NotImplementedError
|
24 |
-
|
25 |
-
def load_audio(self, filename: str) -> torch.Tensor:
|
26 |
-
"""Load and preprocess audio file"""
|
27 |
-
raise NotImplementedError
|
28 |
-
|
29 |
-
def get_phonemes(self, filename: str) -> str:
|
30 |
-
"""Get phoneme sequence from file"""
|
31 |
-
raise NotImplementedError
|
32 |
-
|
33 |
-
|
34 |
-
# Implement datasets
|
35 |
-
class TimitDataManager(DataManager):
|
36 |
-
"""Handles all TIMIT dataset operations"""
|
37 |
-
|
38 |
-
# TIMIT to IPA mapping with direct simplifications
|
39 |
-
_TIMIT_TO_IPA = {
|
40 |
-
# Vowels (simplified)
|
41 |
-
"aa": "ɑ",
|
42 |
-
"ae": "æ",
|
43 |
-
"ah": "ʌ",
|
44 |
-
"ao": "ɔ",
|
45 |
-
"aw": "aʊ",
|
46 |
-
"ay": "aɪ",
|
47 |
-
"eh": "ɛ",
|
48 |
-
"er": "ɹ", # Simplified from 'ɝ'
|
49 |
-
"ey": "eɪ",
|
50 |
-
"ih": "ɪ",
|
51 |
-
"ix": "i", # Simplified from 'ɨ'
|
52 |
-
"iy": "i",
|
53 |
-
"ow": "oʊ",
|
54 |
-
"oy": "ɔɪ",
|
55 |
-
"uh": "ʊ",
|
56 |
-
"uw": "u",
|
57 |
-
"ux": "u", # Simplified from 'ʉ'
|
58 |
-
"ax": "ə",
|
59 |
-
"ax-h": "ə", # Simplified from 'ə̥'
|
60 |
-
"axr": "ɹ", # Simplified from 'ɚ'
|
61 |
-
# Consonants
|
62 |
-
"b": "",
|
63 |
-
"bcl": "b",
|
64 |
-
"d": "",
|
65 |
-
"dcl": "d",
|
66 |
-
"g": "",
|
67 |
-
"gcl": "g",
|
68 |
-
"p": "",
|
69 |
-
"pcl": "p",
|
70 |
-
"t": "",
|
71 |
-
"tcl": "t",
|
72 |
-
"k": "",
|
73 |
-
"kcl": "k",
|
74 |
-
"dx": "ɾ",
|
75 |
-
"q": "ʔ",
|
76 |
-
# Fricatives
|
77 |
-
"jh": "dʒ",
|
78 |
-
"ch": "tʃ",
|
79 |
-
"s": "s",
|
80 |
-
"sh": "ʃ",
|
81 |
-
"z": "z",
|
82 |
-
"zh": "ʒ",
|
83 |
-
"f": "f",
|
84 |
-
"th": "θ",
|
85 |
-
"v": "v",
|
86 |
-
"dh": "ð",
|
87 |
-
"hh": "h",
|
88 |
-
"hv": "h", # Simplified from 'ɦ'
|
89 |
-
# Nasals (simplified)
|
90 |
-
"m": "m",
|
91 |
-
"n": "n",
|
92 |
-
"ng": "ŋ",
|
93 |
-
"em": "m", # Simplified from 'm̩'
|
94 |
-
"en": "n", # Simplified from 'n̩'
|
95 |
-
"eng": "ŋ", # Simplified from 'ŋ̍'
|
96 |
-
"nx": "ɾ", # Simplified from 'ɾ̃'
|
97 |
-
# Semivowels and Glides
|
98 |
-
"l": "l",
|
99 |
-
"r": "ɹ",
|
100 |
-
"w": "w",
|
101 |
-
"wh": "ʍ",
|
102 |
-
"y": "j",
|
103 |
-
"el": "l", # Simplified from 'l̩'
|
104 |
-
# Special
|
105 |
-
"epi": "", # Remove epenthetic silence
|
106 |
-
"h#": "", # Remove start/end silence
|
107 |
-
"pau": "", # Remove pause
|
108 |
-
}
|
109 |
-
|
110 |
-
def __init__(self, timit_path: Path):
|
111 |
-
self.timit_path = timit_path
|
112 |
-
self._zip_ = None
|
113 |
-
print(f"TimitDataManager initialized with path: {self.timit_path.absolute()}")
|
114 |
-
if not self.timit_path.exists():
|
115 |
-
raise FileNotFoundError(
|
116 |
-
f"TIMIT dataset not found at {self.timit_path.absolute()}. Try running ./scripts/download_data_lfs.sh again."
|
117 |
-
)
|
118 |
-
else:
|
119 |
-
print("TIMIT dataset file exists!")
|
120 |
-
|
121 |
-
@property
|
122 |
-
def _zip(self):
|
123 |
-
if not self._zip_:
|
124 |
-
self._zip_ = zipfile.ZipFile(self.timit_path, "r")
|
125 |
-
return self._zip_
|
126 |
-
|
127 |
-
def get_file_list(self, subset: str) -> list[str]:
|
128 |
-
"""Get list of WAV files for given subset"""
|
129 |
-
files = [
|
130 |
-
f
|
131 |
-
for f in self._zip.namelist()
|
132 |
-
if f.endswith(".WAV") and subset.lower() in f.lower()
|
133 |
-
]
|
134 |
-
print(f"Found {len(files)} WAV files in {subset} subset")
|
135 |
-
if files:
|
136 |
-
print("First 3 files:", files[:3])
|
137 |
-
return files
|
138 |
-
|
139 |
-
def load_audio(self, filename: str) -> torch.Tensor:
|
140 |
-
"""Load and preprocess audio file"""
|
141 |
-
with self._zip.open(filename) as wav_file:
|
142 |
-
waveform, sample_rate = torchaudio.load(wav_file) # type: ignore
|
143 |
-
|
144 |
-
if waveform.shape[0] > 1:
|
145 |
-
waveform = torch.mean(waveform, dim=0, keepdim=True)
|
146 |
-
|
147 |
-
if sample_rate != 16000:
|
148 |
-
waveform = torchaudio.transforms.Resample(sample_rate, 16000)(waveform)
|
149 |
-
|
150 |
-
waveform = (waveform - waveform.mean()) / (waveform.std() + 1e-7)
|
151 |
-
|
152 |
-
if waveform.dim() == 1:
|
153 |
-
waveform = waveform.unsqueeze(0)
|
154 |
-
|
155 |
-
return waveform
|
156 |
-
|
157 |
-
def get_phonemes(self, filename: str) -> str:
|
158 |
-
"""Get cleaned phoneme sequence from PHN file and convert to IPA"""
|
159 |
-
phn_file = filename.replace(".WAV", ".PHN")
|
160 |
-
with self._zip.open(phn_file) as f:
|
161 |
-
phonemes = []
|
162 |
-
for line in f.read().decode("utf-8").splitlines():
|
163 |
-
if line.strip():
|
164 |
-
_, _, phone = line.split()
|
165 |
-
phone = self._remove_stress_mark(phone)
|
166 |
-
# Convert to IPA instead of using simplify_timit
|
167 |
-
ipa = self._TIMIT_TO_IPA.get(phone.lower(), "")
|
168 |
-
if ipa:
|
169 |
-
phonemes.append(ipa)
|
170 |
-
return "".join(phonemes) # Join without spaces for IPA
|
171 |
-
|
172 |
-
def _remove_stress_mark(self, text: str) -> str:
|
173 |
-
"""Removes the combining double inverted breve (͡) from text"""
|
174 |
-
if not isinstance(text, str):
|
175 |
-
raise TypeError("Input must be string")
|
176 |
-
return text.replace("͡", "")
|
177 |
-
|
178 |
-
|
179 |
-
# Initialize data managers
|
180 |
-
timit_manager = TimitDataManager(TIMIT_PATH)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/data/test/cache-38f74914f01da443.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a7097497f3a64b59d868eb2b3dadf6887b383555398dec8f3b72e75a295ddb5a
|
3 |
+
size 1248
|
app/data/test/cache-43bad43a3f17100a.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6a87f7da6c1210c5efca97e285fdf608b1101e8c6b506a03812ecf082f089aa0
|
3 |
+
size 1248
|
app/data/test/cache-7fc832a0865b46e3.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:966ac1866fb81a68bcb2269ad1293dd2c045022558b6824a87bb66cada9ff28a
|
3 |
+
size 1248
|
app/data/test/cache-8e3b20205f12c8bf.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:78881cfe43c3a668b24c2269adc6219724a7fec0838bcdf74b71e96a583bf0c6
|
3 |
+
size 1248
|
app/data/test/cache-9a41aaef1a199c0a.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d5c1ab32866ac66f93c5798a888db2d32ca3638aa119de45d587325c2d90964d
|
3 |
+
size 1248
|
app/data/test/cache-9a81afba5c72d77e.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:aee3c9a01bfb57a914f31c6255c55cdd42c5cbda23fab357fb80c32710e92389
|
3 |
+
size 1248
|
app/data/test/cache-bf2efb6be770547b.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e2b2b01c7d81595b4ba5e97902db7bf2ef353eacebf9912a930d16570948cd2d
|
3 |
+
size 1248
|
app/data/test/cache-ceccabba78df3ad3.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:312a7ed183b7aabac6a4553b31fd55dcd6a4af9a1627978f8117278c540885da
|
3 |
+
size 1248
|
app/data/test/cache-d8c639c50adcd3ec.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4347f285083472be0661457b0b6cdf927a302e556a5584d10cdedd15ca936919
|
3 |
+
size 1248
|
app/data/test/cache-f9690e73716e8fdd.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d0c055e60b8afcc4c763157f34d0d17f683f0f8b578116eaf9e604a3d178d9e5
|
3 |
+
size 1248
|
app/data/test/data-00000-of-00001.arrow
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:510501aa7be7ece974c2e9feaaad94ec5d38a7fe4e35dee9b3bf2ee9a485062c
|
3 |
+
size 53582720
|
app/data/test/dataset_info.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"citation": "",
|
3 |
+
"description": "",
|
4 |
+
"features": {
|
5 |
+
"audio": {
|
6 |
+
"_type": "Audio"
|
7 |
+
},
|
8 |
+
"ipa": {
|
9 |
+
"dtype": "string",
|
10 |
+
"_type": "Value"
|
11 |
+
},
|
12 |
+
"dataset": {
|
13 |
+
"dtype": "string",
|
14 |
+
"_type": "Value"
|
15 |
+
}
|
16 |
+
},
|
17 |
+
"homepage": "",
|
18 |
+
"license": ""
|
19 |
+
}
|
app/data/test/state.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_data_files": [
|
3 |
+
{
|
4 |
+
"filename": "data-00000-of-00001.arrow"
|
5 |
+
}
|
6 |
+
],
|
7 |
+
"_fingerprint": "8693a894a9182281",
|
8 |
+
"_format_columns": [
|
9 |
+
"audio",
|
10 |
+
"ipa",
|
11 |
+
"dataset"
|
12 |
+
],
|
13 |
+
"_format_kwargs": {},
|
14 |
+
"_format_type": null,
|
15 |
+
"_output_all_columns": false,
|
16 |
+
"_split": null
|
17 |
+
}
|
app/hf.py
ADDED
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# This module handles interfacing with the huggingface api
|
2 |
+
|
3 |
+
from typing import Literal
|
4 |
+
from datetime import datetime
|
5 |
+
|
6 |
+
from huggingface_hub import HfApi
|
7 |
+
from huggingface_hub.errors import RepositoryNotFoundError
|
8 |
+
from datasets import load_dataset, concatenate_datasets, Dataset, Features, Value
|
9 |
+
from datasets.exceptions import DatasetNotFoundError
|
10 |
+
|
11 |
+
api = HfApi()
|
12 |
+
|
13 |
+
LEADERBOARD_ID = "KoelLabs/_IPA-TRANSCRIPTION-EN-SCORES"
|
14 |
+
LEADERBOARD_FEATURES = Features(
|
15 |
+
{
|
16 |
+
"display_name": Value("string"),
|
17 |
+
"repo_id": Value("string"),
|
18 |
+
"repo_hash": Value("string"),
|
19 |
+
"repo_last_modified": Value("timestamp[s, tz=UTC]"),
|
20 |
+
"submission_timestamp": Value("timestamp[s, tz=UTC]"),
|
21 |
+
"average_per": Value("float32"),
|
22 |
+
"average_fer": Value("float32"),
|
23 |
+
"url": Value("string"),
|
24 |
+
"fer_TIMIT": Value("float32"),
|
25 |
+
"fer_EpaDB": Value("float32"),
|
26 |
+
"fer_PSST": Value("float32"),
|
27 |
+
"fer_SpeechOcean": Value("float32"),
|
28 |
+
"fer_ISLE": Value("float32"),
|
29 |
+
}
|
30 |
+
)
|
31 |
+
LEADERBOARD_DEFAULTS = {
|
32 |
+
"url": "",
|
33 |
+
"fer_TIMIT": None,
|
34 |
+
"fer_EpaDB": None,
|
35 |
+
"fer_PSST": None,
|
36 |
+
"fer_SpeechOcean": None,
|
37 |
+
"fer_ISLE": None,
|
38 |
+
}
|
39 |
+
|
40 |
+
|
41 |
+
def get_repo_info(
|
42 |
+
repo_id, type: Literal["model", "dataset", "space"] = "model"
|
43 |
+
) -> tuple[str, datetime]:
|
44 |
+
try:
|
45 |
+
repo_info = api.repo_info(repo_id=repo_id, repo_type=type)
|
46 |
+
return repo_info.sha, repo_info.last_modified # type: ignore
|
47 |
+
except RepositoryNotFoundError:
|
48 |
+
return "", datetime(year=1970, month=1, day=1)
|
49 |
+
|
50 |
+
|
51 |
+
def get_or_create_leaderboard() -> Dataset:
|
52 |
+
modified = False
|
53 |
+
try:
|
54 |
+
dataset: Dataset = load_dataset(LEADERBOARD_ID)["train"] # type: ignore
|
55 |
+
except DatasetNotFoundError:
|
56 |
+
empty_data = {col: [] for col in LEADERBOARD_FEATURES.keys()}
|
57 |
+
dataset = Dataset.from_dict(empty_data, features=LEADERBOARD_FEATURES)
|
58 |
+
modified = True
|
59 |
+
except ValueError:
|
60 |
+
empty_data = {col: [] for col in LEADERBOARD_FEATURES.keys()}
|
61 |
+
dataset = Dataset.from_dict(empty_data, features=LEADERBOARD_FEATURES)
|
62 |
+
|
63 |
+
for col in LEADERBOARD_FEATURES.keys():
|
64 |
+
if col not in dataset.column_names:
|
65 |
+
modified = True
|
66 |
+
dataset = dataset.add_column(col, [LEADERBOARD_DEFAULTS.get(col)] * len(dataset)) # type: ignore
|
67 |
+
dataset = dataset.cast_column(col, feature=LEADERBOARD_FEATURES[col])
|
68 |
+
|
69 |
+
if modified:
|
70 |
+
dataset.push_to_hub(LEADERBOARD_ID, private=True)
|
71 |
+
|
72 |
+
return dataset
|
73 |
+
|
74 |
+
|
75 |
+
def add_leaderboard_entry(
|
76 |
+
display_name: str,
|
77 |
+
repo_id: str,
|
78 |
+
repo_hash: str,
|
79 |
+
repo_last_modified: datetime,
|
80 |
+
submission_timestamp: datetime,
|
81 |
+
average_per: float,
|
82 |
+
average_fer: float,
|
83 |
+
url: str,
|
84 |
+
per_dataset_fers: dict = {},
|
85 |
+
):
|
86 |
+
existing_dataset = get_or_create_leaderboard()
|
87 |
+
new_row = Dataset.from_dict(
|
88 |
+
dict(
|
89 |
+
display_name=[display_name],
|
90 |
+
repo_id=[repo_id],
|
91 |
+
repo_hash=[repo_hash],
|
92 |
+
repo_last_modified=[repo_last_modified.replace(microsecond=0)],
|
93 |
+
submission_timestamp=[submission_timestamp.replace(microsecond=0)],
|
94 |
+
average_per=[average_per],
|
95 |
+
average_fer=[average_fer],
|
96 |
+
url=[url],
|
97 |
+
fer_TIMIT=[per_dataset_fers.get("TIMIT")],
|
98 |
+
fer_EpaDB=[per_dataset_fers.get("EpaDB")],
|
99 |
+
fer_PSST=[per_dataset_fers.get("PSST")],
|
100 |
+
fer_SpeechOcean=[per_dataset_fers.get("SpeechOcean")],
|
101 |
+
fer_ISLE=[per_dataset_fers.get("ISLE")],
|
102 |
+
),
|
103 |
+
features=LEADERBOARD_FEATURES,
|
104 |
+
)
|
105 |
+
combined_dataset = concatenate_datasets([existing_dataset, new_row])
|
106 |
+
combined_dataset.push_to_hub(LEADERBOARD_ID, private=True)
|
107 |
+
|
108 |
+
|
109 |
+
if __name__ == "__main__":
|
110 |
+
print(get_repo_info(LEADERBOARD_ID, type="dataset"))
|
111 |
+
print(get_or_create_leaderboard().to_pandas().head(5)) # type: ignore
|
app/inference.py
CHANGED
@@ -1,162 +1,50 @@
|
|
1 |
-
# This module handles model inference
|
2 |
-
|
3 |
-
from datetime import datetime
|
4 |
-
from typing import Optional
|
5 |
|
6 |
import torch
|
7 |
from transformers import AutoProcessor, AutoModelForCTC
|
8 |
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
class ModelManager:
|
17 |
-
"""Handles model loading and inference"""
|
18 |
-
|
19 |
-
def __init__(self):
|
20 |
-
self.models = {}
|
21 |
-
self.processors = {}
|
22 |
-
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
23 |
-
self.batch_size = 32
|
24 |
-
|
25 |
-
def get_model_and_processor(self, model_name: str):
|
26 |
-
"""Get or load model and processor"""
|
27 |
-
if model_name not in self.models:
|
28 |
-
print("Loading processor with phoneme tokenizer...")
|
29 |
-
processor = AutoProcessor.from_pretrained(model_name)
|
30 |
-
|
31 |
-
print("Loading model...", {model_name})
|
32 |
-
model = AutoModelForCTC.from_pretrained(model_name).to(self.device)
|
33 |
-
|
34 |
-
self.models[model_name] = model
|
35 |
-
self.processors[model_name] = processor
|
36 |
-
|
37 |
-
return self.models[model_name], self.processors[model_name]
|
38 |
-
|
39 |
-
def transcribe(self, audio_list: list[torch.Tensor], model_name: str) -> list[str]:
|
40 |
-
"""Transcribe a batch of audio using specified model"""
|
41 |
-
model, processor = self.get_model_and_processor(model_name)
|
42 |
-
if not model or not processor:
|
43 |
-
raise Exception("Model and processor not loaded")
|
44 |
-
|
45 |
-
# Process audio in batches
|
46 |
-
all_predictions = []
|
47 |
-
for i in range(0, len(audio_list), self.batch_size):
|
48 |
-
batch_audio = audio_list[i : i + self.batch_size]
|
49 |
-
|
50 |
-
# Pad sequence within batch
|
51 |
-
max_length = max(audio.shape[-1] for audio in batch_audio)
|
52 |
-
padded_audio = torch.zeros((len(batch_audio), 1, max_length))
|
53 |
-
attention_mask = torch.zeros((len(batch_audio), max_length))
|
54 |
-
|
55 |
-
for j, audio in enumerate(batch_audio):
|
56 |
-
padded_audio[j, :, : audio.shape[-1]] = audio
|
57 |
-
attention_mask[j, : audio.shape[-1]] = 1
|
58 |
-
|
59 |
-
# Process batch
|
60 |
-
inputs = processor(
|
61 |
-
padded_audio.squeeze(1).numpy(),
|
62 |
-
sampling_rate=16000,
|
63 |
-
return_tensors="pt",
|
64 |
-
padding=True,
|
65 |
-
)
|
66 |
-
|
67 |
-
input_values = inputs.input_values.to(self.device)
|
68 |
-
attention_mask = inputs.get("attention_mask", attention_mask).to(
|
69 |
-
self.device
|
70 |
-
)
|
71 |
-
|
72 |
-
with torch.no_grad():
|
73 |
-
outputs = model(
|
74 |
-
input_values=input_values, attention_mask=attention_mask
|
75 |
-
)
|
76 |
-
logits = outputs.logits
|
77 |
-
predicted_ids = torch.argmax(logits, dim=-1)
|
78 |
-
predictions = processor.batch_decode(
|
79 |
-
predicted_ids, skip_special_tokens=True
|
80 |
-
)
|
81 |
-
predictions = [pred.replace(" ", "") for pred in predictions]
|
82 |
-
all_predictions.extend(predictions)
|
83 |
-
|
84 |
-
return all_predictions
|
85 |
-
|
86 |
-
|
87 |
-
def evaluate_model(
|
88 |
-
model_name: str,
|
89 |
-
subset: str = "test",
|
90 |
-
max_samples: Optional[int] = None,
|
91 |
-
):
|
92 |
-
"""Evaluate model on TIMIT dataset"""
|
93 |
-
|
94 |
-
files = timit_manager.get_file_list(subset)
|
95 |
-
if max_samples:
|
96 |
-
files = files[:max_samples]
|
97 |
-
|
98 |
-
results = []
|
99 |
-
total_per = total_pwed = 0
|
100 |
-
|
101 |
-
# Process files in batches
|
102 |
-
batch_size = model_manager.batch_size
|
103 |
-
for i in range(0, len(files), batch_size):
|
104 |
-
batch_files = files[i : i + batch_size]
|
105 |
-
|
106 |
-
# Load batch audio and ground truth
|
107 |
-
batch_audio = []
|
108 |
-
batch_ground_truth = []
|
109 |
-
for wav_file in batch_files:
|
110 |
-
audio = timit_manager.load_audio(wav_file)
|
111 |
-
ground_truth = timit_manager.get_phonemes(wav_file)
|
112 |
-
batch_audio.append(audio)
|
113 |
-
batch_ground_truth.append(ground_truth)
|
114 |
|
115 |
-
|
116 |
-
|
117 |
|
118 |
-
|
119 |
-
|
120 |
-
zip(batch_files, predictions, batch_ground_truth)
|
121 |
-
):
|
122 |
-
metrics = phone_errors.compute(
|
123 |
-
predictions=[prediction],
|
124 |
-
references=[ground_truth],
|
125 |
-
is_normalize_pfer=True,
|
126 |
-
)
|
127 |
|
128 |
-
|
129 |
-
|
130 |
|
131 |
-
results.append(
|
132 |
-
{
|
133 |
-
"file": wav_file,
|
134 |
-
"ground_truth": ground_truth,
|
135 |
-
"prediction": prediction,
|
136 |
-
"per": per,
|
137 |
-
"pwed": pwed,
|
138 |
-
}
|
139 |
-
)
|
140 |
|
141 |
-
|
142 |
-
|
|
|
|
|
|
|
143 |
|
144 |
-
if not results:
|
145 |
-
raise Exception("No files were successfully processed")
|
146 |
|
147 |
-
|
148 |
-
|
|
|
|
|
149 |
|
150 |
-
return {
|
151 |
-
"model": model_name,
|
152 |
-
"subset": subset,
|
153 |
-
"num_files": len(results),
|
154 |
-
"average_per": avg_per,
|
155 |
-
"average_pwed": avg_pwed,
|
156 |
-
"detailed_results": results[:5],
|
157 |
-
"timestamp": datetime.now().isoformat(),
|
158 |
-
}
|
159 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
160 |
|
161 |
-
|
162 |
-
|
|
|
1 |
+
# This module handles model inference
|
|
|
|
|
|
|
2 |
|
3 |
import torch
|
4 |
from transformers import AutoProcessor, AutoModelForCTC
|
5 |
|
6 |
+
DEVICE = (
|
7 |
+
"cuda"
|
8 |
+
if torch.cuda.is_available()
|
9 |
+
else "mps" if torch.backends.mps.is_available() else "cpu"
|
10 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
+
# set espeak library path for macOS
|
13 |
+
import sys
|
14 |
|
15 |
+
if sys.platform == "darwin":
|
16 |
+
from phonemizer.backend.espeak.wrapper import EspeakWrapper
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
+
_ESPEAK_LIBRARY = "/opt/homebrew/Cellar/espeak/1.48.04_1/lib/libespeak.1.1.48.dylib"
|
19 |
+
EspeakWrapper.set_library(_ESPEAK_LIBRARY)
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
+
def clear_cache():
|
23 |
+
if torch.cuda.is_available():
|
24 |
+
torch.cuda.empty_cache()
|
25 |
+
torch.cuda.ipc_collect()
|
26 |
+
torch.mps.empty_cache()
|
27 |
|
|
|
|
|
28 |
|
29 |
+
def load_model(model_id, device=DEVICE):
|
30 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
31 |
+
model = AutoModelForCTC.from_pretrained(model_id).to(device)
|
32 |
+
return model, processor
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
+
def transcribe(audio, model, processor) -> str:
|
36 |
+
input_values = (
|
37 |
+
processor(
|
38 |
+
[audio],
|
39 |
+
sampling_rate=processor.feature_extractor.sampling_rate,
|
40 |
+
return_tensors="pt",
|
41 |
+
padding=True,
|
42 |
+
)
|
43 |
+
.input_values.type(torch.float32)
|
44 |
+
.to(model.device)
|
45 |
+
)
|
46 |
+
with torch.no_grad():
|
47 |
+
logits = model(input_values).logits
|
48 |
|
49 |
+
predicted_ids = torch.argmax(logits, dim=-1)
|
50 |
+
return processor.decode(predicted_ids[0])
|
app/metrics.py
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# This module defines evaluation metrics
|
2 |
+
|
3 |
+
from yaml import warnings
|
4 |
+
|
5 |
+
warnings({"YAMLLoadWarning": False})
|
6 |
+
|
7 |
+
import panphon
|
8 |
+
import panphon.distance
|
9 |
+
|
10 |
+
ft = panphon.FeatureTable()
|
11 |
+
panphon_dist = panphon.distance.Distance()
|
12 |
+
inverse_double_weight_sum = 1 / (sum(ft.weights) * 2)
|
13 |
+
|
14 |
+
|
15 |
+
def per(prediction, ground_truth):
|
16 |
+
"""
|
17 |
+
Phoneme Error Rate: the number of edits (substitutions, insertions, deletions)
|
18 |
+
needed to transform the prediction into the ground truth divided by the length of the ground truth.
|
19 |
+
"""
|
20 |
+
return panphon_dist.fast_levenshtein_distance(prediction, ground_truth) / len(
|
21 |
+
ground_truth
|
22 |
+
)
|
23 |
+
|
24 |
+
|
25 |
+
def fer(prediction, ground_truth):
|
26 |
+
"""
|
27 |
+
Feature Error Rate: the edits weighted by their acoustic features summed up and divided by the length of the ground truth.
|
28 |
+
"""
|
29 |
+
return (
|
30 |
+
inverse_double_weight_sum
|
31 |
+
* panphon_dist.weighted_feature_edit_distance(ground_truth, prediction)
|
32 |
+
/ len(ground_truth)
|
33 |
+
)
|
app/phone_metrics.py
DELETED
@@ -1,108 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
This module implements phone error metrics based on the work from ginic/phone_errors.
|
3 |
-
Original implementation: https://huggingface.co/spaces/ginic/phone_errors
|
4 |
-
|
5 |
-
Citation:
|
6 |
-
@inproceedings{Mortensen-et-al:2016,
|
7 |
-
author = {David R. Mortensen and
|
8 |
-
Patrick Littell and
|
9 |
-
Akash Bharadwaj and
|
10 |
-
Kartik Goyal and
|
11 |
-
Chris Dyer and
|
12 |
-
Lori S. Levin},
|
13 |
-
title = {PanPhon: {A} Resource for Mapping {IPA} Segments to Articulatory Feature Vectors},
|
14 |
-
booktitle = {Proceedings of {COLING} 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
|
15 |
-
pages = {3475--3484},
|
16 |
-
publisher = {{ACL}},
|
17 |
-
year = {2016}
|
18 |
-
}
|
19 |
-
"""
|
20 |
-
|
21 |
-
import numpy as np
|
22 |
-
import panphon.distance
|
23 |
-
|
24 |
-
|
25 |
-
class PhoneErrorMetrics:
|
26 |
-
def __init__(self, feature_model: str = "segment"):
|
27 |
-
"""Initialize the phone error metrics calculator.
|
28 |
-
|
29 |
-
Args:
|
30 |
-
feature_model (str): panphon feature parsing model ("strict", "permissive", or "segment")
|
31 |
-
"""
|
32 |
-
self.distance_computer = panphon.distance.Distance(feature_model=feature_model)
|
33 |
-
|
34 |
-
def _phone_error_rate(self, prediction: str, reference: str) -> float:
|
35 |
-
"""Compute phone error rate between prediction and reference.
|
36 |
-
|
37 |
-
Args:
|
38 |
-
prediction (str): Predicted IPA string
|
39 |
-
reference (str): Reference IPA string
|
40 |
-
|
41 |
-
Returns:
|
42 |
-
float: Phone error rate
|
43 |
-
"""
|
44 |
-
if not reference:
|
45 |
-
raise ValueError("Reference string cannot be empty")
|
46 |
-
|
47 |
-
pred_phones = self.distance_computer.fm.ipa_segs(prediction)
|
48 |
-
ref_phones = self.distance_computer.fm.ipa_segs(reference)
|
49 |
-
|
50 |
-
phone_edits = self.distance_computer.min_edit_distance(
|
51 |
-
lambda x: 1, # deletion cost
|
52 |
-
lambda x: 1, # insertion cost
|
53 |
-
lambda x, y: 0 if x == y else 1, # substitution cost
|
54 |
-
[[]],
|
55 |
-
pred_phones,
|
56 |
-
ref_phones,
|
57 |
-
)
|
58 |
-
|
59 |
-
return phone_edits / len(ref_phones)
|
60 |
-
|
61 |
-
def compute(
|
62 |
-
self,
|
63 |
-
predictions: list[str],
|
64 |
-
references: list[str],
|
65 |
-
is_normalize_pfer: bool = False,
|
66 |
-
) -> dict:
|
67 |
-
"""Compute phone error metrics between predictions and references.
|
68 |
-
|
69 |
-
Args:
|
70 |
-
predictions (List[str]): List of predicted IPA strings
|
71 |
-
references (List[str]): List of reference IPA strings
|
72 |
-
is_normalize_pfer (bool): Whether to normalize phone feature error rates
|
73 |
-
|
74 |
-
Returns:
|
75 |
-
Dict containing:
|
76 |
-
- phone_error_rates: List of PER for each pair
|
77 |
-
- mean_phone_error_rate: Average PER
|
78 |
-
- phone_feature_error_rates: List of PFER for each pair
|
79 |
-
- mean_phone_feature_error_rate: Average PFER
|
80 |
-
- feature_error_rates: List of FER for each pair
|
81 |
-
- mean_feature_error_rate: Average FER
|
82 |
-
"""
|
83 |
-
phone_error_rates = []
|
84 |
-
feature_error_rates = []
|
85 |
-
hamming_distances = []
|
86 |
-
|
87 |
-
for pred, ref in zip(predictions, references):
|
88 |
-
if is_normalize_pfer:
|
89 |
-
hd = self.distance_computer.hamming_feature_edit_distance_div_maxlen(
|
90 |
-
pred, ref
|
91 |
-
)
|
92 |
-
else:
|
93 |
-
hd = self.distance_computer.hamming_feature_edit_distance(pred, ref)
|
94 |
-
|
95 |
-
hamming_distances.append(hd)
|
96 |
-
per = self._phone_error_rate(pred, ref)
|
97 |
-
phone_error_rates.append(per)
|
98 |
-
fer = self.distance_computer.feature_error_rate(pred, ref)
|
99 |
-
feature_error_rates.append(fer)
|
100 |
-
|
101 |
-
return {
|
102 |
-
"phone_error_rates": phone_error_rates,
|
103 |
-
"mean_phone_error_rate": float(np.mean(phone_error_rates)),
|
104 |
-
"phone_feature_error_rates": hamming_distances,
|
105 |
-
"mean_phone_feature_error_rate": float(np.mean(hamming_distances)),
|
106 |
-
"feature_error_rates": feature_error_rates,
|
107 |
-
"mean_feature_error_rate": float(np.mean(feature_error_rates)),
|
108 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/queue/leaderboard.json
DELETED
@@ -1,192 +0,0 @@
|
|
1 |
-
[
|
2 |
-
{
|
3 |
-
"submission_id": "8e6a3a00-59fa-4a24-861d-a132a8212658",
|
4 |
-
"submission_name": "facebook espeak",
|
5 |
-
"model": "facebook/wav2vec2-lv-60-espeak-cv-ft",
|
6 |
-
"average_per": 0.33667301260691423,
|
7 |
-
"average_pwed": 0.1276725657099669,
|
8 |
-
"subset": "timit-test",
|
9 |
-
"github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
|
10 |
-
"submission_date": "2024-12-05T07:32:06.850230"
|
11 |
-
},
|
12 |
-
{
|
13 |
-
"submission_id": "70aceb68-ad86-4a83-9998-08adb27b4d5c",
|
14 |
-
"submission_name": "english phoneme model",
|
15 |
-
"model": "KoelLabs/xlsr-timit-b0",
|
16 |
-
"average_per": 0.12572285528714347,
|
17 |
-
"average_pwed": 0.06476636812791145,
|
18 |
-
"subset": "timit-test",
|
19 |
-
"github_url": "https://github.com/KoelLabs/",
|
20 |
-
"submission_date": "2024-12-05T08:25:24.982477"
|
21 |
-
},
|
22 |
-
{
|
23 |
-
"submission_id": "80b57299-b3ab-4caf-ac4a-898c8398046e",
|
24 |
-
"submission_name": "speech 31 model",
|
25 |
-
"model": "speech31/wav2vec2-large-TIMIT-IPA",
|
26 |
-
"average_per": 0.4415425496841929,
|
27 |
-
"average_pwed": 0.18625930002594002,
|
28 |
-
"subset": "timit-test",
|
29 |
-
"github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
|
30 |
-
"submission_date": "2024-12-05T09:36:14.570315"
|
31 |
-
},
|
32 |
-
{
|
33 |
-
"submission_id": "0cbcab0a-bd07-421f-82a0-480c9507a214",
|
34 |
-
"submission_name": "jubiliano model wav2vec2",
|
35 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
|
36 |
-
"average_per": 0.6318471187460027,
|
37 |
-
"average_pwed": 0.222932144739126,
|
38 |
-
"subset": "timit-test",
|
39 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces/tree/d5312009d8e620b183c334dfdd9ffc6b4f06f8c1",
|
40 |
-
"submission_date": "2024-12-05T10:17:21.334530"
|
41 |
-
},
|
42 |
-
{
|
43 |
-
"submission_id": "0fc29c54-3db2-46b6-aeee-c96484306751",
|
44 |
-
"submission_name": "xlsr 53 model",
|
45 |
-
"model": "facebook/wav2vec2-xlsr-53-espeak-cv-ft",
|
46 |
-
"average_per": 0.348845592557092,
|
47 |
-
"average_pwed": 0.1386742019529415,
|
48 |
-
"subset": "timit-test",
|
49 |
-
"github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
|
50 |
-
"submission_date": "2024-12-05T10:34:26.157054"
|
51 |
-
},
|
52 |
-
{
|
53 |
-
"submission_id": "a23026ec-acac-4481-9761-f9368b4b94f1",
|
54 |
-
"submission_name": "ginic model wav2vec2 finetuned on buckeye",
|
55 |
-
"model": "ginic/hyperparam_tuning_1_wav2vec2-large-xlsr-buckeye-ipa",
|
56 |
-
"average_per": 0.2766466385175833,
|
57 |
-
"average_pwed": 0.10410683992600853,
|
58 |
-
"subset": "timit-test",
|
59 |
-
"github_url": "https://huggingface.co/ginic/vary_individuals_old_only_1_wav2vec2-large-xlsr-buckeye-ipa",
|
60 |
-
"submission_date": "2024-12-05T11:06:07.984825"
|
61 |
-
},
|
62 |
-
{
|
63 |
-
"submission_id": "e3bbf521-cc32-43a6-bf1c-5ddc6bce04ab",
|
64 |
-
"submission_name": "koel labs initial ",
|
65 |
-
"model": "KoelLabs/xlsr-timit-a0",
|
66 |
-
"average_per": 0.24242141955346685,
|
67 |
-
"average_pwed": 0.17395311976938,
|
68 |
-
"subset": "timit-test",
|
69 |
-
"github_url": "https://github.com/KoelLabs/ML/",
|
70 |
-
"submission_date": "2024-12-12T16:07:25.391145"
|
71 |
-
},
|
72 |
-
{
|
73 |
-
"submission_id": "02f223d4-7b98-4613-9377-19b74defe308",
|
74 |
-
"submission_name": "wav2vec2 ipa eng ",
|
75 |
-
"model": "snu-nia-12/wav2vec2-large_nia12_phone-ipa_english",
|
76 |
-
"average_per": 0.4847029843149011,
|
77 |
-
"average_pwed": 0.2072006544586948,
|
78 |
-
"subset": "timit-test",
|
79 |
-
"github_url": null,
|
80 |
-
"submission_date": "2024-12-18T22:01:20.855881"
|
81 |
-
},
|
82 |
-
{
|
83 |
-
"submission_id": "bed08468-42c7-459f-a46d-49ead50abfbc",
|
84 |
-
"submission_name": "fine-tuned version of facebook/wav2vec2-xls-r-300m on the Timit dataset",
|
85 |
-
"model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
|
86 |
-
"average_per": 0.2561961414705681,
|
87 |
-
"average_pwed": 0.1378394393452702,
|
88 |
-
"subset": "timit-test",
|
89 |
-
"github_url": "https://www.kaggle.com/code/vitouphy/phoneme-recognition-with-wav2vec2",
|
90 |
-
"submission_date": "2024-12-18T22:50:59.627338"
|
91 |
-
},
|
92 |
-
{
|
93 |
-
"submission_id": "4086072e-9368-442f-97cd-1fda6bf6656e",
|
94 |
-
"submission_name": "wav2vec2 model",
|
95 |
-
"model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa-plus-2000",
|
96 |
-
"average_per": 0.6479484324708775,
|
97 |
-
"average_pwed": 0.18710002665151734,
|
98 |
-
"subset": "timit-test",
|
99 |
-
"github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
|
100 |
-
"submission_date": "2024-12-18T23:29:27.322286"
|
101 |
-
},
|
102 |
-
{
|
103 |
-
"submission_id": "d0b2f8b4-20f8-45b4-b1a5-c81390d75b29",
|
104 |
-
"submission_name": "wav2vec2 non-english transcription",
|
105 |
-
"model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
|
106 |
-
"average_per": 0.6417205190285036,
|
107 |
-
"average_pwed": 0.19048963968896404,
|
108 |
-
"subset": "timit-test",
|
109 |
-
"github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
|
110 |
-
"submission_date": "2024-12-19T07:41:18.135985"
|
111 |
-
},
|
112 |
-
{
|
113 |
-
"submission_id": "3bbb0f03-31a5-45b0-bde3-bbf574f19983",
|
114 |
-
"submission_name": "phonetic transcription with the Buckeye corpus, from xlsr-53 model",
|
115 |
-
"model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
|
116 |
-
"average_per": 0.2810165988557621,
|
117 |
-
"average_pwed": 0.10703377161801164,
|
118 |
-
"subset": "timit-test",
|
119 |
-
"github_url": "https://github.com/ginic/multipa/tree/buckeye_experiments",
|
120 |
-
"submission_date": "2024-12-20T13:45:52.010575"
|
121 |
-
},
|
122 |
-
{
|
123 |
-
"submission_id": "2ed095f7-4712-4539-87b6-1e8588ac92a3",
|
124 |
-
"submission_name": "phonetic transcription",
|
125 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.9.2WithoutSpaces",
|
126 |
-
"average_per": 0.9537775908999574,
|
127 |
-
"average_pwed": 0.9351204819224959,
|
128 |
-
"subset": "timit-test",
|
129 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces",
|
130 |
-
"submission_date": "2024-12-20T14:21:32.293694"
|
131 |
-
},
|
132 |
-
{
|
133 |
-
"submission_id": "9cf02ce8-fc43-4d23-a8bb-b44e3116a93c",
|
134 |
-
"submission_name": "Jubliano xlsr model",
|
135 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
|
136 |
-
"average_per": 0.9887075544197294,
|
137 |
-
"average_pwed": 0.9692486915717254,
|
138 |
-
"subset": "timit-test",
|
139 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-nl1.1",
|
140 |
-
"submission_date": "2024-12-20T15:40:51.632895"
|
141 |
-
},
|
142 |
-
{
|
143 |
-
"submission_id": "d5013845-f5c9-428a-8b39-7db066bb9f05",
|
144 |
-
"submission_name": "speech31 phoneme transcription english",
|
145 |
-
"model": "speech31/wavlm-large-english-ipa",
|
146 |
-
"average_per": 0.3694017596969614,
|
147 |
-
"average_pwed": 0.1356824900612308,
|
148 |
-
"subset": "timit-test",
|
149 |
-
"github_url": "https://huggingface.co/speech31/wavlm-large-english-ipa",
|
150 |
-
"submission_date": "2024-12-20T16:26:47.982209"
|
151 |
-
},
|
152 |
-
{
|
153 |
-
"submission_id": "362c788d-bc2e-427d-8c74-105f6235cf62",
|
154 |
-
"submission_name": "speech31 xlsr model",
|
155 |
-
"model": "speech31/XLS-R-300m-english-ipa",
|
156 |
-
"average_per": 0.36382554692045954,
|
157 |
-
"average_pwed": 0.1299702312124616,
|
158 |
-
"subset": "timit-test",
|
159 |
-
"github_url": "https://huggingface.co/speech31/XLS-R-300m-english-ipa",
|
160 |
-
"submission_date": "2024-12-20T16:47:54.826509"
|
161 |
-
},
|
162 |
-
{
|
163 |
-
"submission_id": "49e22782-0af1-4313-bc0c-60cb2f28d78f",
|
164 |
-
"submission_name": "model is a fine-tuned version of facebook/wav2vec2-large on the TIMIT dataset",
|
165 |
-
"model": "speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
|
166 |
-
"average_per": 0.44563344149564776,
|
167 |
-
"average_pwed": 0.18844914029048124,
|
168 |
-
"subset": "timit-test",
|
169 |
-
"github_url": "https://huggingface.co/speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
|
170 |
-
"submission_date": "2024-12-20T17:05:35.213738"
|
171 |
-
},
|
172 |
-
{
|
173 |
-
"submission_id": "26c04108-1131-435c-95f1-bb56b2aff06c",
|
174 |
-
"submission_name": "fine-tuned version of facebook/wav2vec2-large on the None dataset",
|
175 |
-
"model": "speech31/wav2vec2-large-TIMIT-IPA2",
|
176 |
-
"average_per": 0.4847029843149011,
|
177 |
-
"average_pwed": 0.2072006544586948,
|
178 |
-
"subset": "timit-test",
|
179 |
-
"github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
|
180 |
-
"submission_date": "2024-12-20T22:50:50.645178"
|
181 |
-
},
|
182 |
-
{
|
183 |
-
"submission_id": "4126d265-418f-4d11-8a29-4e69f064f1dd",
|
184 |
-
"submission_name": "ginic model, facebook/wav2vec2-large-xlsr-53 fine tuned",
|
185 |
-
"model": "ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
|
186 |
-
"average_per": 0.2807914104790719,
|
187 |
-
"average_pwed": 0.10494355278037441,
|
188 |
-
"subset": "timit-test",
|
189 |
-
"github_url": "https://huggingface.co/ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
|
190 |
-
"submission_date": "2024-12-21T01:31:04.862397"
|
191 |
-
}
|
192 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/queue/results.json
DELETED
@@ -1,1014 +0,0 @@
|
|
1 |
-
[
|
2 |
-
{
|
3 |
-
"task_id": "721b4c64-a825-42d3-bb0a-bdff9ee1ed0f",
|
4 |
-
"model": "facebook/wav2vec2-lv-60-espeak-cv-ft",
|
5 |
-
"subset": "timit-test",
|
6 |
-
"num_files": 1680,
|
7 |
-
"average_per": 0.33667301260691423,
|
8 |
-
"average_pwed": 0.1276725657099669,
|
9 |
-
"detailed_results": [
|
10 |
-
{
|
11 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
12 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
13 |
-
"prediction": "ʃiːhædjɚdɑːɹksuːɾɪnɡɹiːsiwɑːʃwɑːɾɚɹɑːljiː",
|
14 |
-
"per": 0.3939393939393939,
|
15 |
-
"pwed": 0.13888888888888887
|
16 |
-
},
|
17 |
-
{
|
18 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
19 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
20 |
-
"prediction": "doʊntæskmiːtəkæɹiɐnoɪliɹæɡlaɪkðæt",
|
21 |
-
"per": 0.32142857142857145,
|
22 |
-
"pwed": 0.13541666666666666
|
23 |
-
},
|
24 |
-
{
|
25 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
26 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
27 |
-
"prediction": "hɪzkæptənwʌzθɪnændhæɡɚdændhɪzbjuːɾɪfəlbuːtswɜːwɔːɹnændʃæbi",
|
28 |
-
"per": 0.3617021276595745,
|
29 |
-
"pwed": 0.13915094339622644
|
30 |
-
},
|
31 |
-
{
|
32 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
33 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
34 |
-
"prediction": "ðəɹiːzənzfɜːðɪsdaɪvsiːmdfuːlɪʃnaʊ",
|
35 |
-
"per": 0.20689655172413793,
|
36 |
-
"pwed": 0.022988505747126433
|
37 |
-
},
|
38 |
-
{
|
39 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
40 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
41 |
-
"prediction": "pɹədʌkʃənmeɪfɔːlfɑːɹbᵻloʊɛkspɛkteɪʃənz",
|
42 |
-
"per": 0.36363636363636365,
|
43 |
-
"pwed": 0.1392857142857143
|
44 |
-
}
|
45 |
-
],
|
46 |
-
"timestamp": "2024-12-05T07:32:06.849017"
|
47 |
-
},
|
48 |
-
{
|
49 |
-
"task_id": "d6fe0956-b5b4-4105-835e-8dee1872ee4d",
|
50 |
-
"model": "KoelLabs/xlsr-timit-b0",
|
51 |
-
"subset": "timit-test",
|
52 |
-
"num_files": 1680,
|
53 |
-
"average_per": 0.12572285528714347,
|
54 |
-
"average_pwed": 0.06476636812791145,
|
55 |
-
"detailed_results": [
|
56 |
-
{
|
57 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
58 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
59 |
-
"prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɑʃwɔɾɹʔɔljɪɹ",
|
60 |
-
"per": 0.12121212121212122,
|
61 |
-
"pwed": 0.037990196078431376
|
62 |
-
},
|
63 |
-
{
|
64 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
65 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
66 |
-
"prediction": "oʊnæskmitikæɹinɔɪliɹæɡlaɪkðæt",
|
67 |
-
"per": 0.14285714285714285,
|
68 |
-
"pwed": 0.10632183908045977
|
69 |
-
},
|
70 |
-
{
|
71 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
72 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
73 |
-
"prediction": "hɪzkæpinwəsθɪnhæɡɹdinizbjuɾiflbutswɹwɔɹninʃæbi",
|
74 |
-
"per": 0.10638297872340426,
|
75 |
-
"pwed": 0.0425531914893617
|
76 |
-
},
|
77 |
-
{
|
78 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
79 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
80 |
-
"prediction": "ðəɹiznzfɹðistaɪvsimdfuliʃnaʊ",
|
81 |
-
"per": 0.13793103448275862,
|
82 |
-
"pwed": 0.04166666666666667
|
83 |
-
},
|
84 |
-
{
|
85 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
86 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
87 |
-
"prediction": "pɹdʌkʃnmeɪfɔlfɑɹbloʊɛkspɛkeɪʃəns",
|
88 |
-
"per": 0.21212121212121213,
|
89 |
-
"pwed": 0.10858585858585859
|
90 |
-
}
|
91 |
-
],
|
92 |
-
"timestamp": "2024-12-05T08:25:24.980111"
|
93 |
-
},
|
94 |
-
{
|
95 |
-
"task_id": "dbf4642a-fb13-402c-8a74-cc41fc4be599",
|
96 |
-
"model": "speech31/wav2vec2-large-TIMIT-IPA",
|
97 |
-
"subset": "timit-test",
|
98 |
-
"num_files": 1680,
|
99 |
-
"average_per": 0.4415425496841929,
|
100 |
-
"average_pwed": 0.18625930002594002,
|
101 |
-
"detailed_results": [
|
102 |
-
{
|
103 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
104 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
105 |
-
"prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrrrɪrɪrʃ",
|
106 |
-
"per": 0.5757575757575758,
|
107 |
-
"pwed": 0.25
|
108 |
-
},
|
109 |
-
{
|
110 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
111 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
112 |
-
"prediction": "doʊntæskmitɪkɛri��nɔɪliræglaɪkðəttm",
|
113 |
-
"per": 0.35714285714285715,
|
114 |
-
"pwed": 0.172979797979798
|
115 |
-
},
|
116 |
-
{
|
117 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
118 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
119 |
-
"prediction": "hɪzkæptɪnwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnəndʃæbi",
|
120 |
-
"per": 0.40425531914893614,
|
121 |
-
"pwed": 0.17500000000000004
|
122 |
-
},
|
123 |
-
{
|
124 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
125 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
126 |
-
"prediction": "ðərizɪənzfərðɪstaɪvsimdfulɪʃnaʊaʊaʊ",
|
127 |
-
"per": 0.3793103448275862,
|
128 |
-
"pwed": 0.18928571428571428
|
129 |
-
},
|
130 |
-
{
|
131 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
132 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
133 |
-
"prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzd",
|
134 |
-
"per": 0.3939393939393939,
|
135 |
-
"pwed": 0.13626126126126126
|
136 |
-
}
|
137 |
-
],
|
138 |
-
"timestamp": "2024-12-05T09:36:14.568321"
|
139 |
-
},
|
140 |
-
{
|
141 |
-
"task_id": "912449a4-d7ed-4af4-b5be-5c2c57ec09ff",
|
142 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
|
143 |
-
"subset": "timit-test",
|
144 |
-
"num_files": 1680,
|
145 |
-
"average_per": 0.6318471187460027,
|
146 |
-
"average_pwed": 0.222932144739126,
|
147 |
-
"detailed_results": [
|
148 |
-
{
|
149 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
150 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
151 |
-
"prediction": "ʒihɛldjydɑrksydənrisiwɑswadərɑlhir",
|
152 |
-
"per": 0.5454545454545454,
|
153 |
-
"pwed": 0.11764705882352941
|
154 |
-
},
|
155 |
-
{
|
156 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
157 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
158 |
-
"prediction": "dɑnraːstɪkmədəkaːrənoːjliralɪkaːn",
|
159 |
-
"per": 0.7857142857142857,
|
160 |
-
"pwed": 0.2341954022988506
|
161 |
-
},
|
162 |
-
{
|
163 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
164 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
165 |
-
"prediction": "xisʃktəʋɑstɪnɛnhɛɪɡərdɛnenzbjudəvɔlbutvɔːrʋɔrnənʃaːbi",
|
166 |
-
"per": 0.6595744680851063,
|
167 |
-
"pwed": 0.18382352941176472
|
168 |
-
},
|
169 |
-
{
|
170 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
171 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
172 |
-
"prediction": "dərizənsvərdəstajfzimtvuləsna",
|
173 |
-
"per": 0.6206896551724138,
|
174 |
-
"pwed": 0.11781609195402297
|
175 |
-
},
|
176 |
-
{
|
177 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
178 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
179 |
-
"prediction": "pːdkəmeːvɑlvɑrbəloɛkspɛkteːʃəns",
|
180 |
-
"per": 0.5454545454545454,
|
181 |
-
"pwed": 0.2171717171717172
|
182 |
-
}
|
183 |
-
],
|
184 |
-
"timestamp": "2024-12-05T10:17:21.331572"
|
185 |
-
},
|
186 |
-
{
|
187 |
-
"task_id": "c79df17e-2bb2-4253-ae26-f7cc6ab21265",
|
188 |
-
"model": "facebook/wav2vec2-xlsr-53-espeak-cv-ft",
|
189 |
-
"subset": "timit-test",
|
190 |
-
"num_files": 1680,
|
191 |
-
"average_per": 0.348845592557092,
|
192 |
-
"average_pwed": 0.1386742019529415,
|
193 |
-
"detailed_results": [
|
194 |
-
{
|
195 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
196 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
197 |
-
"prediction": "ʃiːhædjɚdksuːtɪnɡɹiːsiwɑːʃwɑːɾɚɑːljɪ",
|
198 |
-
"per": 0.48484848484848486,
|
199 |
-
"pwed": 0.21338383838383837
|
200 |
-
},
|
201 |
-
{
|
202 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
203 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
204 |
-
"prediction": "doːntæskmitəkæɹiənoɪliɹæɡlaɪkðæt",
|
205 |
-
"per": 0.32142857142857145,
|
206 |
-
"pwed": 0.12634408602150538
|
207 |
-
},
|
208 |
-
{
|
209 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
210 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
211 |
-
"prediction": "hɪzkæptənwʌzθɪnænhæɡɚdændhɪzbjuːɾɪfʊbuːtswɚwoːnəndʃæbi",
|
212 |
-
"per": 0.3617021276595745,
|
213 |
-
"pwed": 0.13095238095238093
|
214 |
-
},
|
215 |
-
{
|
216 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
217 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
218 |
-
"prediction": "ðəɹiːzənzfɚðəsdɑːvsiːmdfuːlɪʃnæ",
|
219 |
-
"per": 0.3793103448275862,
|
220 |
-
"pwed": 0.12068965517241376
|
221 |
-
},
|
222 |
-
{
|
223 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
224 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
225 |
-
"prediction": "pɹədʌkʃənmeɪfɑːlfɑːbəloʊɛkspɛkteɪʃənz",
|
226 |
-
"per": 0.36363636363636365,
|
227 |
-
"pwed": 0.14404761904761906
|
228 |
-
}
|
229 |
-
],
|
230 |
-
"timestamp": "2024-12-05T10:34:26.154521"
|
231 |
-
},
|
232 |
-
{
|
233 |
-
"task_id": "f36060e6-a746-44dc-a527-54995b270053",
|
234 |
-
"model": "ginic/hyperparam_tuning_1_wav2vec2-large-xlsr-buckeye-ipa",
|
235 |
-
"subset": "timit-test",
|
236 |
-
"num_files": 1680,
|
237 |
-
"average_per": 0.2766466385175833,
|
238 |
-
"average_pwed": 0.10410683992600853,
|
239 |
-
"detailed_results": [
|
240 |
-
{
|
241 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
242 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
243 |
-
"prediction": "ʃihædjɹ̩dɑɹksuɾɪnɡɹeɪsiwɑʃwɔɾɹ̩ɔljiɹ",
|
244 |
-
"per": 0.24242424242424243,
|
245 |
-
"pwed": 0.09926470588235292
|
246 |
-
},
|
247 |
-
{
|
248 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
249 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
250 |
-
"prediction": "doʊndæskmidɪkæɹiɛnɔɪliɹæɡlaɪkðæʔ",
|
251 |
-
"per": 0.32142857142857145,
|
252 |
-
"pwed": 0.14192708333333334
|
253 |
-
},
|
254 |
-
{
|
255 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
256 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
257 |
-
"prediction": "hɪzkæptɪnwʌzθɪnɛnhæɡɹ̩dɛnɪzbjuɾʌfl̩butswɹ̩wɔɹnɛnʃæbi",
|
258 |
-
"per": 0.2553191489361702,
|
259 |
-
"pwed": 0.05357142857142857
|
260 |
-
},
|
261 |
-
{
|
262 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
263 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
264 |
-
"prediction": "ðʌɹizʌnzfɹ̩ðʌstaɪvsimdfulɪʃnaʊ",
|
265 |
-
"per": 0.20689655172413793,
|
266 |
-
"pwed": 0.01293103448275862
|
267 |
-
},
|
268 |
-
{
|
269 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
270 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
271 |
-
"prediction": "pɹʌdʌkʃʌnmeɪfɔlfɑɹbʌloʊɛkspɛkteɪʃʌns",
|
272 |
-
"per": 0.2727272727272727,
|
273 |
-
"pwed": 0.10416666666666667
|
274 |
-
}
|
275 |
-
],
|
276 |
-
"timestamp": "2024-12-05T11:06:07.981224"
|
277 |
-
},
|
278 |
-
{
|
279 |
-
"task_id": "47d56349-8111-4bda-a47f-e007dbedd36d",
|
280 |
-
"model": "KoelLabs/xlsr-timit-a0",
|
281 |
-
"subset": "timit-test",
|
282 |
-
"num_files": 1680,
|
283 |
-
"average_per": 0.24242141955346685,
|
284 |
-
"average_pwed": 0.17395311976938,
|
285 |
-
"detailed_results": [
|
286 |
-
{
|
287 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
288 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
289 |
-
"prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɑʃwɔɾɹʔɔljɪɹ",
|
290 |
-
"per": 0.12121212121212122,
|
291 |
-
"pwed": 0.037990196078431376
|
292 |
-
},
|
293 |
-
{
|
294 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
295 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
296 |
-
"prediction": "ɪoʊnæskmitikæɹinɔɪliɹæɡlaɪkðt",
|
297 |
-
"per": 0.21428571428571427,
|
298 |
-
"pwed": 0.1695402298850575
|
299 |
-
},
|
300 |
-
{
|
301 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
302 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
303 |
-
"prediction": "hɪzkæpinwəsθɪninhæɡɹdinhizbjuɾiflbutswɹwɔɹnintʃæbi",
|
304 |
-
"per": 0.1276595744680851,
|
305 |
-
"pwed": 0.06499999999999999
|
306 |
-
},
|
307 |
-
{
|
308 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
309 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
310 |
-
"prediction": "ðəɹiznzfɹðistaɪ",
|
311 |
-
"per": 0.5862068965517241,
|
312 |
-
"pwed": 0.4899425287356322
|
313 |
-
},
|
314 |
-
{
|
315 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
316 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
317 |
-
"prediction": "ɹidʌkʃinmeɪfɔlfɑɹbəloʊɛkspɛkeɪ",
|
318 |
-
"per": 0.21212121212121213,
|
319 |
-
"pwed": 0.1553030303030303
|
320 |
-
}
|
321 |
-
],
|
322 |
-
"timestamp": "2024-12-12T15:53:07.584096"
|
323 |
-
},
|
324 |
-
{
|
325 |
-
"task_id": "51dd5735-63bd-4fe5-a588-c0fc079076e0",
|
326 |
-
"model": "KoelLabs/xlsr-timit-a0",
|
327 |
-
"subset": "timit-test",
|
328 |
-
"num_files": 1680,
|
329 |
-
"average_per": 0.24242141955346685,
|
330 |
-
"average_pwed": 0.17395311976938,
|
331 |
-
"detailed_results": [
|
332 |
-
{
|
333 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
334 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
335 |
-
"prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɑʃwɔɾɹʔɔljɪɹ",
|
336 |
-
"per": 0.12121212121212122,
|
337 |
-
"pwed": 0.037990196078431376
|
338 |
-
},
|
339 |
-
{
|
340 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
341 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
342 |
-
"prediction": "ɪoʊnæskmitikæɹinɔɪliɹæɡlaɪkðt",
|
343 |
-
"per": 0.21428571428571427,
|
344 |
-
"pwed": 0.1695402298850575
|
345 |
-
},
|
346 |
-
{
|
347 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
348 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
349 |
-
"prediction": "hɪzkæpinwəsθɪninhæɡɹdinhizbjuɾiflbutswɹwɔɹnintʃæbi",
|
350 |
-
"per": 0.1276595744680851,
|
351 |
-
"pwed": 0.06499999999999999
|
352 |
-
},
|
353 |
-
{
|
354 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
355 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
356 |
-
"prediction": "ðəɹiznzfɹðistaɪ",
|
357 |
-
"per": 0.5862068965517241,
|
358 |
-
"pwed": 0.4899425287356322
|
359 |
-
},
|
360 |
-
{
|
361 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
362 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
363 |
-
"prediction": "ɹidʌkʃinmeɪfɔlfɑɹbəloʊɛkspɛkeɪ",
|
364 |
-
"per": 0.21212121212121213,
|
365 |
-
"pwed": 0.1553030303030303
|
366 |
-
}
|
367 |
-
],
|
368 |
-
"timestamp": "2024-12-12T16:07:25.389475"
|
369 |
-
},
|
370 |
-
{
|
371 |
-
"task_id": "2e592612-ca38-4afb-a6a0-3c870b288960",
|
372 |
-
"model": "snu-nia-12/wav2vec2-large_nia12_phone-ipa_english",
|
373 |
-
"subset": "timit-test",
|
374 |
-
"num_files": 1680,
|
375 |
-
"average_per": 0.4847029843149011,
|
376 |
-
"average_pwed": 0.2072006544586948,
|
377 |
-
"detailed_results": [
|
378 |
-
{
|
379 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
380 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
381 |
-
"prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrər",
|
382 |
-
"per": 0.42424242424242425,
|
383 |
-
"pwed": 0.15393518518518517
|
384 |
-
},
|
385 |
-
{
|
386 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
387 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
388 |
-
"prediction": "doʊntæskmitɪkɛriənɔɪliræglaɪkðətdoʊndt",
|
389 |
-
"per": 0.5,
|
390 |
-
"pwed": 0.2623873873873874
|
391 |
-
},
|
392 |
-
{
|
393 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
394 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
395 |
-
"prediction": "hɪzkæptənwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnəndʃæbiiii",
|
396 |
-
"per": 0.46808510638297873,
|
397 |
-
"pwed": 0.2191091954022989
|
398 |
-
},
|
399 |
-
{
|
400 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
401 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
402 |
-
"prediction": "ðərizənzfərðɪstaɪvsimdfulɪʃnaʊ",
|
403 |
-
"per": 0.20689655172413793,
|
404 |
-
"pwed": 0.054166666666666675
|
405 |
-
},
|
406 |
-
{
|
407 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
408 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
409 |
-
"prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzpzppppzpdtdtd",
|
410 |
-
"per": 0.7272727272727273,
|
411 |
-
"pwed": 0.34438775510204084
|
412 |
-
}
|
413 |
-
],
|
414 |
-
"timestamp": "2024-12-18T22:01:20.853274"
|
415 |
-
},
|
416 |
-
{
|
417 |
-
"task_id": "d38e65ce-75b5-4dbf-8ade-bff6a5803790",
|
418 |
-
"model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
|
419 |
-
"subset": "timit-test",
|
420 |
-
"num_files": 1680,
|
421 |
-
"average_per": 0.2561961414705681,
|
422 |
-
"average_pwed": 0.1378394393452702,
|
423 |
-
"detailed_results": [
|
424 |
-
{
|
425 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
426 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
427 |
-
"prediction": "ʃihædjɝdɑɹksuɾɪngɹisiwɑʃwɑɾɝɑljiɝ",
|
428 |
-
"per": 0.18181818181818182,
|
429 |
-
"pwed": 0.13257575757575757
|
430 |
-
},
|
431 |
-
{
|
432 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
433 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
434 |
-
"prediction": "doʊnæskmitɪkæɹiɪnɔɪliɹæglaɪkðæ",
|
435 |
-
"per": 0.21428571428571427,
|
436 |
-
"pwed": 0.10919540229885057
|
437 |
-
},
|
438 |
-
{
|
439 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
440 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
441 |
-
"prediction": "hɪzkætɪnwəsθɪnənhægɝdɪnɪzbjuɾɪflbutswɝwɑɹnɪnʃæbi",
|
442 |
-
"per": 0.19148936170212766,
|
443 |
-
"pwed": 0.0576241134751773
|
444 |
-
},
|
445 |
-
{
|
446 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
447 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
448 |
-
"prediction": "ðɪɹizənzfɝðɪsdaɪvsimdfulɪʃnaʊ",
|
449 |
-
"per": 0.10344827586206896,
|
450 |
-
"pwed": 0.03735632183908046
|
451 |
-
},
|
452 |
-
{
|
453 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
454 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
455 |
-
"prediction": "pɹɝdəkʃɪnmeɪfɑlfɹbloʊɛkspɛteɪʃɪns",
|
456 |
-
"per": 0.3333333333333333,
|
457 |
-
"pwed": 0.12373737373737376
|
458 |
-
}
|
459 |
-
],
|
460 |
-
"timestamp": "2024-12-18T22:50:59.625872"
|
461 |
-
},
|
462 |
-
{
|
463 |
-
"task_id": "2839c0c6-8f3b-426e-9eb7-04b6e133dc47",
|
464 |
-
"model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa-plus-2000",
|
465 |
-
"subset": "timit-test",
|
466 |
-
"num_files": 1680,
|
467 |
-
"average_per": 0.6479484324708775,
|
468 |
-
"average_pwed": 0.18710002665151734,
|
469 |
-
"detailed_results": [
|
470 |
-
{
|
471 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
472 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
473 |
-
"prediction": "ʂixadjodarksyːdɨnɡwisiwaːʃwarɒɔjiːr",
|
474 |
-
"per": 0.6060606060606061,
|
475 |
-
"pwed": 0.15404040404040406
|
476 |
-
},
|
477 |
-
{
|
478 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
479 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
480 |
-
"prediction": "dondaːskmiːdɨkɛːɻjɒnojluiʋɻaːɡlɑjɡtaːn",
|
481 |
-
"per": 0.8928571428571429,
|
482 |
-
"pwed": 0.2146464646464646
|
483 |
-
},
|
484 |
-
{
|
485 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
486 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
487 |
-
"prediction": "hizkaːptanustinanhagɛɻdɛnizbiurufubutswuɾʋoːɻninʂaːbi",
|
488 |
-
"per": 0.5106382978723404,
|
489 |
-
"pwed": 0.1096938775510204
|
490 |
-
},
|
491 |
-
{
|
492 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
493 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
494 |
-
"prediction": "ðrisɔnsfrdɔsdaːjvsimtfulɛʂnɛ",
|
495 |
-
"per": 0.5172413793103449,
|
496 |
-
"pwed": 0.11063218390804598
|
497 |
-
},
|
498 |
-
{
|
499 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
500 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
501 |
-
"prediction": "pɛdakɕɔnmɛjfaɔfarbuwɔwɛkspɛktajʂɔnt͡s",
|
502 |
-
"per": 0.7272727272727273,
|
503 |
-
"pwed": 0.15
|
504 |
-
}
|
505 |
-
],
|
506 |
-
"timestamp": "2024-12-18T23:29:27.320433"
|
507 |
-
},
|
508 |
-
{
|
509 |
-
"task_id": "59afc37a-0072-44dd-a02a-0cf47d89c120",
|
510 |
-
"model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
|
511 |
-
"subset": "timit-test",
|
512 |
-
"num_files": 1680,
|
513 |
-
"average_per": 0.6417205190285036,
|
514 |
-
"average_pwed": 0.19048963968896404,
|
515 |
-
"detailed_results": [
|
516 |
-
{
|
517 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
518 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
519 |
-
"prediction": "ʂiharjoɖarksɯudenɡwisiwaːʂwarɔːjiːr",
|
520 |
-
"per": 0.696969696969697,
|
521 |
-
"pwed": 0.20580808080808083
|
522 |
-
},
|
523 |
-
{
|
524 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
525 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
526 |
-
"prediction": "dɔndaːskmidɨkaːɻjɑno̞jwɯräːɡläikθaːn",
|
527 |
-
"per": 0.8214285714285714,
|
528 |
-
"pwed": 0.17338709677419356
|
529 |
-
},
|
530 |
-
{
|
531 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
532 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
533 |
-
"prediction": "çizkatːɛnwɔstinanhaːɡɛɾdanɨzbirufubuswɔwoːɾnenʂaːbi",
|
534 |
-
"per": 0.5531914893617021,
|
535 |
-
"pwed": 0.1276595744680851
|
536 |
-
},
|
537 |
-
{
|
538 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
539 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
540 |
-
"prediction": "ðɔriːzɔnsfɾdɔɕtaːivsimtfuøʃnɛu",
|
541 |
-
"per": 0.5862068965517241,
|
542 |
-
"pwed": 0.08764367816091957
|
543 |
-
},
|
544 |
-
{
|
545 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
546 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
547 |
-
"prediction": "pɾɔdakʂɔnmɛjfaɔfaɾbuwɔuwɛkspɛktajʂons",
|
548 |
-
"per": 0.7575757575757576,
|
549 |
-
"pwed": 0.18806306306306303
|
550 |
-
}
|
551 |
-
],
|
552 |
-
"timestamp": "2024-12-19T07:41:18.132953"
|
553 |
-
},
|
554 |
-
{
|
555 |
-
"task_id": "5517f6b2-6a76-4a2d-a6ce-33446f390c3b",
|
556 |
-
"model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
|
557 |
-
"subset": "timit-test",
|
558 |
-
"num_files": 1680,
|
559 |
-
"average_per": 0.2810165988557621,
|
560 |
-
"average_pwed": 0.10703377161801164,
|
561 |
-
"detailed_results": [
|
562 |
-
{
|
563 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
564 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
565 |
-
"prediction": "ʃihædjɹ̩dɑɹksudɪnɡɹisiwɑʃwɑɾɹ̩ɔljiɹ",
|
566 |
-
"per": 0.18181818181818182,
|
567 |
-
"pwed": 0.07196969696969698
|
568 |
-
},
|
569 |
-
{
|
570 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
571 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
572 |
-
"prediction": "doʊndæskmitɪkæɹiʌnɔɪliɹæɡlaɪkðæʔ",
|
573 |
-
"per": 0.2857142857142857,
|
574 |
-
"pwed": 0.14062500000000003
|
575 |
-
},
|
576 |
-
{
|
577 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
578 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
579 |
-
"prediction": "hɪzkæptʌnwʌzθɪnhæɡɹ̩dɛnɪzbjuɾʌfl̩butswɹ̩wɔʊɹnɪnʃæbi",
|
580 |
-
"per": 0.2978723404255319,
|
581 |
-
"pwed": 0.09114583333333333
|
582 |
-
},
|
583 |
-
{
|
584 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
585 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
586 |
-
"prediction": "ðʌɹizʌnzfɹ̩ðʌstaɪvsimtfulɪʃnaʊ",
|
587 |
-
"per": 0.2413793103448276,
|
588 |
-
"pwed": 0.014367816091954023
|
589 |
-
},
|
590 |
-
{
|
591 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
592 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
593 |
-
"prediction": "pɹʌdʌkʃʌnmeɪfɔlfɑɹbʌloʊɛkspɛkteɪʃʌnz",
|
594 |
-
"per": 0.30303030303030304,
|
595 |
-
"pwed": 0.10532407407407407
|
596 |
-
}
|
597 |
-
],
|
598 |
-
"timestamp": "2024-12-20T13:45:52.009233"
|
599 |
-
},
|
600 |
-
{
|
601 |
-
"task_id": "c2139f96-e79e-4f25-a525-aa039f65555f",
|
602 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.9.2WithoutSpaces",
|
603 |
-
"subset": "timit-test",
|
604 |
-
"num_files": 1680,
|
605 |
-
"average_per": 0.9537775908999574,
|
606 |
-
"average_pwed": 0.9351204819224959,
|
607 |
-
"detailed_results": [
|
608 |
-
{
|
609 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
610 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
611 |
-
"prediction": "iɛ2",
|
612 |
-
"per": 0.9696969696969697,
|
613 |
-
"pwed": 0.9406565656565656
|
614 |
-
},
|
615 |
-
{
|
616 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
617 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
618 |
-
"prediction": "iɛ2",
|
619 |
-
"per": 0.9285714285714286,
|
620 |
-
"pwed": 0.9285714285714286
|
621 |
-
},
|
622 |
-
{
|
623 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
624 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
625 |
-
"prediction": "iɛ2",
|
626 |
-
"per": 0.9787234042553191,
|
627 |
-
"pwed": 0.9583333333333333
|
628 |
-
},
|
629 |
-
{
|
630 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
631 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
632 |
-
"prediction": "iɛ2",
|
633 |
-
"per": 0.9655172413793104,
|
634 |
-
"pwed": 0.932471264367816
|
635 |
-
},
|
636 |
-
{
|
637 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
638 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
639 |
-
"prediction": "iɛ2",
|
640 |
-
"per": 0.9696969696969697,
|
641 |
-
"pwed": 0.9406565656565656
|
642 |
-
}
|
643 |
-
],
|
644 |
-
"timestamp": "2024-12-20T14:21:32.290889"
|
645 |
-
},
|
646 |
-
{
|
647 |
-
"task_id": "d146f1f1-6e6e-4b28-9420-c652ae9a1002",
|
648 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
|
649 |
-
"subset": "timit-test",
|
650 |
-
"num_files": 1680,
|
651 |
-
"average_per": 0.9887075544197294,
|
652 |
-
"average_pwed": 0.9692486915717254,
|
653 |
-
"detailed_results": [
|
654 |
-
{
|
655 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
656 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
657 |
-
"prediction": "p",
|
658 |
-
"per": 1.0,
|
659 |
-
"pwed": 0.9747474747474747
|
660 |
-
},
|
661 |
-
{
|
662 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
663 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
664 |
-
"prediction": "p",
|
665 |
-
"per": 1.0,
|
666 |
-
"pwed": 0.96875
|
667 |
-
},
|
668 |
-
{
|
669 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
670 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
671 |
-
"prediction": "p",
|
672 |
-
"per": 0.9787234042553191,
|
673 |
-
"pwed": 0.9787234042553191
|
674 |
-
},
|
675 |
-
{
|
676 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
677 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
678 |
-
"prediction": "p",
|
679 |
-
"per": 1.0,
|
680 |
-
"pwed": 0.9683908045977011
|
681 |
-
},
|
682 |
-
{
|
683 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
684 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
685 |
-
"prediction": "p",
|
686 |
-
"per": 0.9696969696969697,
|
687 |
-
"pwed": 0.9696969696969697
|
688 |
-
}
|
689 |
-
],
|
690 |
-
"timestamp": "2024-12-20T15:26:27.658798"
|
691 |
-
},
|
692 |
-
{
|
693 |
-
"task_id": "265c5859-e7ba-492d-a6c9-45733dc17c99",
|
694 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
|
695 |
-
"subset": "timit-test",
|
696 |
-
"num_files": 1680,
|
697 |
-
"average_per": 0.9887075544197294,
|
698 |
-
"average_pwed": 0.9692486915717254,
|
699 |
-
"detailed_results": [
|
700 |
-
{
|
701 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
702 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
703 |
-
"prediction": "p",
|
704 |
-
"per": 1.0,
|
705 |
-
"pwed": 0.9747474747474747
|
706 |
-
},
|
707 |
-
{
|
708 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
709 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
710 |
-
"prediction": "p",
|
711 |
-
"per": 1.0,
|
712 |
-
"pwed": 0.96875
|
713 |
-
},
|
714 |
-
{
|
715 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
716 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
717 |
-
"prediction": "p",
|
718 |
-
"per": 0.9787234042553191,
|
719 |
-
"pwed": 0.9787234042553191
|
720 |
-
},
|
721 |
-
{
|
722 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
723 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
724 |
-
"prediction": "p",
|
725 |
-
"per": 1.0,
|
726 |
-
"pwed": 0.9683908045977011
|
727 |
-
},
|
728 |
-
{
|
729 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
730 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
731 |
-
"prediction": "p",
|
732 |
-
"per": 0.9696969696969697,
|
733 |
-
"pwed": 0.9696969696969697
|
734 |
-
}
|
735 |
-
],
|
736 |
-
"timestamp": "2024-12-20T15:40:51.631218"
|
737 |
-
},
|
738 |
-
{
|
739 |
-
"task_id": "e297dfde-95e5-462b-a6e5-8fa43bc30bc0",
|
740 |
-
"model": "speech31/wavlm-large-english-ipa",
|
741 |
-
"subset": "timit-test",
|
742 |
-
"num_files": 1680,
|
743 |
-
"average_per": 0.3694017596969614,
|
744 |
-
"average_pwed": 0.1356824900612308,
|
745 |
-
"detailed_results": [
|
746 |
-
{
|
747 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
748 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
749 |
-
"prediction": "ʃihædjɔɹdɑɹksutɪnɡɹisiwɑʃwɔtɹ̩ɔljɪɹ",
|
750 |
-
"per": 0.2727272727272727,
|
751 |
-
"pwed": 0.11274509803921567
|
752 |
-
},
|
753 |
-
{
|
754 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
755 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
756 |
-
"prediction": "dɑntæskmitəkæɹiænojliɹæɡlajkðæt",
|
757 |
-
"per": 0.39285714285714285,
|
758 |
-
"pwed": 0.13575268817204303
|
759 |
-
},
|
760 |
-
{
|
761 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
762 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
763 |
-
"prediction": "hɪzkæpptənwɑzθɪændhæɡɹ̩dænhɪzbjutəfəlbutswɹ̩wɔɹnɪnʃæbi",
|
764 |
-
"per": 0.3404255319148936,
|
765 |
-
"pwed": 0.12980769230769232
|
766 |
-
},
|
767 |
-
{
|
768 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
769 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
770 |
-
"prediction": "ðəɹizənzfɔɹðəsdajvsimdfulɪʃnaw",
|
771 |
-
"per": 0.20689655172413793,
|
772 |
-
"pwed": 0.051388888888888894
|
773 |
-
},
|
774 |
-
{
|
775 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
776 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
777 |
-
"prediction": "pɹədʌkʃənmejffɔlfɔɑɹbɪlowɪkspɛktejʃənz",
|
778 |
-
"per": 0.45454545454545453,
|
779 |
-
"pwed": 0.16666666666666666
|
780 |
-
}
|
781 |
-
],
|
782 |
-
"timestamp": "2024-12-20T16:13:24.050232"
|
783 |
-
},
|
784 |
-
{
|
785 |
-
"task_id": "efe95f71-05e3-485d-8e0c-1823a3037cf4",
|
786 |
-
"model": "speech31/wavlm-large-english-ipa",
|
787 |
-
"subset": "timit-test",
|
788 |
-
"num_files": 1680,
|
789 |
-
"average_per": 0.3694017596969614,
|
790 |
-
"average_pwed": 0.1356824900612308,
|
791 |
-
"detailed_results": [
|
792 |
-
{
|
793 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
794 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
795 |
-
"prediction": "ʃihædjɔɹdɑɹksutɪnɡɹisiwɑʃwɔtɹ̩ɔljɪɹ",
|
796 |
-
"per": 0.2727272727272727,
|
797 |
-
"pwed": 0.11274509803921567
|
798 |
-
},
|
799 |
-
{
|
800 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
801 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
802 |
-
"prediction": "dɑntæskmitəkæɹiænojliɹæɡlajkðæt",
|
803 |
-
"per": 0.39285714285714285,
|
804 |
-
"pwed": 0.13575268817204303
|
805 |
-
},
|
806 |
-
{
|
807 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
808 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
809 |
-
"prediction": "hɪzkæpptənwɑzθɪændhæɡɹ̩dænhɪzbjutəfəlbutswɹ̩wɔɹnɪnʃæbi",
|
810 |
-
"per": 0.3404255319148936,
|
811 |
-
"pwed": 0.12980769230769232
|
812 |
-
},
|
813 |
-
{
|
814 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
815 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
816 |
-
"prediction": "ðəɹizənzfɔɹðəsdajvsimdfulɪʃnaw",
|
817 |
-
"per": 0.20689655172413793,
|
818 |
-
"pwed": 0.051388888888888894
|
819 |
-
},
|
820 |
-
{
|
821 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
822 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
823 |
-
"prediction": "pɹədʌkʃənmejffɔlfɔɑɹbɪlowɪkspɛktejʃənz",
|
824 |
-
"per": 0.45454545454545453,
|
825 |
-
"pwed": 0.16666666666666666
|
826 |
-
}
|
827 |
-
],
|
828 |
-
"timestamp": "2024-12-20T16:26:47.980084"
|
829 |
-
},
|
830 |
-
{
|
831 |
-
"task_id": "4b2ae2fc-fe2f-4f8b-9e8f-25c0bae13c0d",
|
832 |
-
"model": "speech31/XLS-R-300m-english-ipa",
|
833 |
-
"subset": "timit-test",
|
834 |
-
"num_files": 1680,
|
835 |
-
"average_per": 0.36382554692045954,
|
836 |
-
"average_pwed": 0.1299702312124616,
|
837 |
-
"detailed_results": [
|
838 |
-
{
|
839 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
840 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
841 |
-
"prediction": "ʃihædjɔɹdɑɹksutɪnɡɹisiwɑʃwɔtɹ̩ɔljɪɹ",
|
842 |
-
"per": 0.2727272727272727,
|
843 |
-
"pwed": 0.11274509803921567
|
844 |
-
},
|
845 |
-
{
|
846 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
847 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
848 |
-
"prediction": "dɑntæskmitəkæɹiænojliɹæɡlajkðæt",
|
849 |
-
"per": 0.39285714285714285,
|
850 |
-
"pwed": 0.13575268817204303
|
851 |
-
},
|
852 |
-
{
|
853 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
854 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
855 |
-
"prediction": "hɪzkæmptənwɑzθɪnændhæɡɹ̩dɪndhɪzbjutəfəlbutswɹ̩wɔɹnɪnʃæbi",
|
856 |
-
"per": 0.3404255319148936,
|
857 |
-
"pwed": 0.14583333333333334
|
858 |
-
},
|
859 |
-
{
|
860 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
861 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
862 |
-
"prediction": "ðəɹɛzənzfɔɹðɪstajvsimdfulɪʃnaw",
|
863 |
-
"per": 0.2413793103448276,
|
864 |
-
"pwed": 0.052777777777777785
|
865 |
-
},
|
866 |
-
{
|
867 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
868 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
869 |
-
"prediction": "pɹədʌkʃənmejfɔlfɑɹbɪlowɛkspɛktejʃənz",
|
870 |
-
"per": 0.3939393939393939,
|
871 |
-
"pwed": 0.11921296296296297
|
872 |
-
}
|
873 |
-
],
|
874 |
-
"timestamp": "2024-12-20T16:47:54.824174"
|
875 |
-
},
|
876 |
-
{
|
877 |
-
"task_id": "33d387c0-703c-415d-b8e2-81cea87a2146",
|
878 |
-
"model": "speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
|
879 |
-
"subset": "timit-test",
|
880 |
-
"num_files": 1680,
|
881 |
-
"average_per": 0.44563344149564776,
|
882 |
-
"average_pwed": 0.18844914029048124,
|
883 |
-
"detailed_results": [
|
884 |
-
{
|
885 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
886 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
887 |
-
"prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrr",
|
888 |
-
"per": 0.3939393939393939,
|
889 |
-
"pwed": 0.12976190476190474
|
890 |
-
},
|
891 |
-
{
|
892 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
893 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
894 |
-
"prediction": "doʊntæskmitɪkɛriənɔɪliræglaɪkðətdnt",
|
895 |
-
"per": 0.39285714285714285,
|
896 |
-
"pwed": 0.19730392156862747
|
897 |
-
},
|
898 |
-
{
|
899 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
900 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
901 |
-
"prediction": "hɪzkæptənwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnɪnʃæbibæb",
|
902 |
-
"per": 0.44680851063829785,
|
903 |
-
"pwed": 0.20394736842105265
|
904 |
-
},
|
905 |
-
{
|
906 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
907 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
908 |
-
"prediction": "ðərizənzfərðɪsstaɪvsimdfulɪʃnaʊa",
|
909 |
-
"per": 0.27586206896551724,
|
910 |
-
"pwed": 0.11328125
|
911 |
-
},
|
912 |
-
{
|
913 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
914 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
915 |
-
"prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzd",
|
916 |
-
"per": 0.3939393939393939,
|
917 |
-
"pwed": 0.13626126126126126
|
918 |
-
}
|
919 |
-
],
|
920 |
-
"timestamp": "2024-12-20T17:05:35.210786"
|
921 |
-
},
|
922 |
-
{
|
923 |
-
"task_id": "c89bcefc-3884-435a-a54c-24297fe6f041",
|
924 |
-
"model": "speech31/wav2vec2-large-TIMIT-IPA2",
|
925 |
-
"subset": "timit-test",
|
926 |
-
"num_files": 1680,
|
927 |
-
"average_per": 0.4847029843149011,
|
928 |
-
"average_pwed": 0.2072006544586948,
|
929 |
-
"detailed_results": [
|
930 |
-
{
|
931 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
932 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
933 |
-
"prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrər",
|
934 |
-
"per": 0.42424242424242425,
|
935 |
-
"pwed": 0.15393518518518517
|
936 |
-
},
|
937 |
-
{
|
938 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
939 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
940 |
-
"prediction": "doʊntæskmitɪkɛriənɔɪliræglaɪkðətdoʊndt",
|
941 |
-
"per": 0.5,
|
942 |
-
"pwed": 0.2623873873873874
|
943 |
-
},
|
944 |
-
{
|
945 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
946 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
947 |
-
"prediction": "hɪzkæptənwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnəndʃæbiiii",
|
948 |
-
"per": 0.46808510638297873,
|
949 |
-
"pwed": 0.2191091954022989
|
950 |
-
},
|
951 |
-
{
|
952 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
953 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
954 |
-
"prediction": "ðərizənzfərðɪstaɪvsimdfulɪʃnaʊ",
|
955 |
-
"per": 0.20689655172413793,
|
956 |
-
"pwed": 0.054166666666666675
|
957 |
-
},
|
958 |
-
{
|
959 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
960 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
961 |
-
"prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzpzppppzpdtdtd",
|
962 |
-
"per": 0.7272727272727273,
|
963 |
-
"pwed": 0.34438775510204084
|
964 |
-
}
|
965 |
-
],
|
966 |
-
"timestamp": "2024-12-20T22:50:50.641790"
|
967 |
-
},
|
968 |
-
{
|
969 |
-
"task_id": "81fa94f8-94ae-4601-952c-24abaddaf691",
|
970 |
-
"model": "ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
|
971 |
-
"subset": "timit-test",
|
972 |
-
"num_files": 1680,
|
973 |
-
"average_per": 0.2807914104790719,
|
974 |
-
"average_pwed": 0.10494355278037441,
|
975 |
-
"detailed_results": [
|
976 |
-
{
|
977 |
-
"file": "data/TEST/DR1/FAKS0/SA1.WAV",
|
978 |
-
"ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
|
979 |
-
"prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɔʃwɔɾɹ̩ɔljiɹ",
|
980 |
-
"per": 0.18181818181818182,
|
981 |
-
"pwed": 0.0744949494949495
|
982 |
-
},
|
983 |
-
{
|
984 |
-
"file": "data/TEST/DR1/FAKS0/SA2.WAV",
|
985 |
-
"ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
|
986 |
-
"prediction": "doʊndæskmidɪkæɹiɪnɔɪliɹæɡlaɪkðæʔ",
|
987 |
-
"per": 0.32142857142857145,
|
988 |
-
"pwed": 0.140625
|
989 |
-
},
|
990 |
-
{
|
991 |
-
"file": "data/TEST/DR1/FAKS0/SI1573.WAV",
|
992 |
-
"ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
|
993 |
-
"prediction": "hɪzkæptʌnwʌzθɪnɛnhæɡɹ̩dɛnɪzbjuɾʌfl̩butswɹ̩wɔɹnɪnʃæbi",
|
994 |
-
"per": 0.2553191489361702,
|
995 |
-
"pwed": 0.05357142857142856
|
996 |
-
},
|
997 |
-
{
|
998 |
-
"file": "data/TEST/DR1/FAKS0/SI2203.WAV",
|
999 |
-
"ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
|
1000 |
-
"prediction": "ðʌɹizʌn̩zfɹðʌstaɪvsimtfulɪʃnaʊ",
|
1001 |
-
"per": 0.2413793103448276,
|
1002 |
-
"pwed": 0.014367816091954023
|
1003 |
-
},
|
1004 |
-
{
|
1005 |
-
"file": "data/TEST/DR1/FAKS0/SI943.WAV",
|
1006 |
-
"ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
|
1007 |
-
"prediction": "pɹʌdʌkʃn̩meɪfɔlfɑɹbʌloʊɛkspɛkteɪʃʌns",
|
1008 |
-
"per": 0.30303030303030304,
|
1009 |
-
"pwed": 0.12023809523809523
|
1010 |
-
}
|
1011 |
-
],
|
1012 |
-
"timestamp": "2024-12-21T01:31:04.859070"
|
1013 |
-
}
|
1014 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/queue/tasks.json
DELETED
@@ -1,237 +0,0 @@
|
|
1 |
-
[
|
2 |
-
{
|
3 |
-
"id": "721b4c64-a825-42d3-bb0a-bdff9ee1ed0f",
|
4 |
-
"model": "facebook/wav2vec2-lv-60-espeak-cv-ft",
|
5 |
-
"subset": "timit-test",
|
6 |
-
"submission_name": "facebook espeak",
|
7 |
-
"github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
|
8 |
-
"status": "completed",
|
9 |
-
"submitted_at": "2024-12-05T07:19:03.076292"
|
10 |
-
},
|
11 |
-
{
|
12 |
-
"id": "d6fe0956-b5b4-4105-835e-8dee1872ee4d",
|
13 |
-
"model": "KoelLabs/xlsr-timit-b0",
|
14 |
-
"subset": "timit-test",
|
15 |
-
"submission_name": "english phoneme model",
|
16 |
-
"github_url": "https://github.com/KoelLabs/",
|
17 |
-
"status": "completed",
|
18 |
-
"submitted_at": "2024-12-05T08:12:40.161444"
|
19 |
-
},
|
20 |
-
{
|
21 |
-
"id": "dbf4642a-fb13-402c-8a74-cc41fc4be599",
|
22 |
-
"model": "speech31/wav2vec2-large-TIMIT-IPA",
|
23 |
-
"subset": "timit-test",
|
24 |
-
"submission_name": "speech 31 model",
|
25 |
-
"github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
|
26 |
-
"status": "completed",
|
27 |
-
"submitted_at": "2024-12-05T09:13:45.315361"
|
28 |
-
},
|
29 |
-
{
|
30 |
-
"id": "4e3b80be-b255-47f2-b4ae-18a12e232e8a",
|
31 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
|
32 |
-
"subset": "timit-test",
|
33 |
-
"submission_name": "Jubliano model",
|
34 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces/tree/d5312009d8e620b183c334dfdd9ffc6b4f06f8c1",
|
35 |
-
"status": "processing",
|
36 |
-
"submitted_at": "2024-12-05T09:36:14.571930"
|
37 |
-
},
|
38 |
-
{
|
39 |
-
"id": "912449a4-d7ed-4af4-b5be-5c2c57ec09ff",
|
40 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
|
41 |
-
"subset": "timit-test",
|
42 |
-
"submission_name": "jubiliano model wav2vec2",
|
43 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces/tree/d5312009d8e620b183c334dfdd9ffc6b4f06f8c1",
|
44 |
-
"status": "completed",
|
45 |
-
"submitted_at": "2024-12-05T10:01:40.502935"
|
46 |
-
},
|
47 |
-
{
|
48 |
-
"id": "c79df17e-2bb2-4253-ae26-f7cc6ab21265",
|
49 |
-
"model": "facebook/wav2vec2-xlsr-53-espeak-cv-ft",
|
50 |
-
"subset": "timit-test",
|
51 |
-
"submission_name": "xlsr 53 model",
|
52 |
-
"github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
|
53 |
-
"status": "completed",
|
54 |
-
"submitted_at": "2024-12-05T10:18:37.408664"
|
55 |
-
},
|
56 |
-
{
|
57 |
-
"id": "f36060e6-a746-44dc-a527-54995b270053",
|
58 |
-
"model": "ginic/hyperparam_tuning_1_wav2vec2-large-xlsr-buckeye-ipa",
|
59 |
-
"subset": "timit-test",
|
60 |
-
"submission_name": "ginic model wav2vec2 finetuned on buckeye",
|
61 |
-
"github_url": "https://huggingface.co/ginic/vary_individuals_old_only_1_wav2vec2-large-xlsr-buckeye-ipa",
|
62 |
-
"status": "completed",
|
63 |
-
"submitted_at": "2024-12-05T10:36:02.340422"
|
64 |
-
},
|
65 |
-
{
|
66 |
-
"id": "abf6c247-9faf-46ef-b0fa-25f2669da922",
|
67 |
-
"model": "KoelLabs/xlsr-timit-a0",
|
68 |
-
"subset": "timit-test",
|
69 |
-
"submission_name": "Koel Labs early version of finetuned model ",
|
70 |
-
"github_url": "https://github.com/KoelLabs/ML",
|
71 |
-
"status": "processing",
|
72 |
-
"submitted_at": "2024-12-05T11:08:23.663553"
|
73 |
-
},
|
74 |
-
{
|
75 |
-
"id": "47d56349-8111-4bda-a47f-e007dbedd36d",
|
76 |
-
"model": "KoelLabs/xlsr-timit-a0",
|
77 |
-
"subset": "timit-test",
|
78 |
-
"submission_name": "koel labs initial ",
|
79 |
-
"github_url": "https://github.com/KoelLabs/ML/",
|
80 |
-
"status": "completed",
|
81 |
-
"submitted_at": "2024-12-12T15:28:12.923626"
|
82 |
-
},
|
83 |
-
{
|
84 |
-
"id": "51dd5735-63bd-4fe5-a588-c0fc079076e0",
|
85 |
-
"model": "KoelLabs/xlsr-timit-a0",
|
86 |
-
"subset": "timit-test",
|
87 |
-
"submission_name": "koel labs initial ",
|
88 |
-
"github_url": "https://github.com/KoelLabs/ML/",
|
89 |
-
"status": "completed",
|
90 |
-
"submitted_at": "2024-12-12T15:53:07.620070"
|
91 |
-
},
|
92 |
-
{
|
93 |
-
"id": "2e592612-ca38-4afb-a6a0-3c870b288960",
|
94 |
-
"model": "snu-nia-12/wav2vec2-large_nia12_phone-ipa_english",
|
95 |
-
"subset": "timit-test",
|
96 |
-
"submission_name": "wav2vec2 ipa eng ",
|
97 |
-
"github_url": "",
|
98 |
-
"status": "completed",
|
99 |
-
"submitted_at": "2024-12-18T21:41:21.861322"
|
100 |
-
},
|
101 |
-
{
|
102 |
-
"id": "ac4cbe86-4dbe-4929-8f76-4d2052e0acf1",
|
103 |
-
"model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
|
104 |
-
"subset": "timit-test",
|
105 |
-
"submission_name": "fine-tuned version of facebook/wav2vec2-xls-r-300m on the Timit dataset",
|
106 |
-
"github_url": "https://www.kaggle.com/code/vitouphy/phoneme-recognition-with-wav2vec2",
|
107 |
-
"status": "processing",
|
108 |
-
"submitted_at": "2024-12-18T22:09:03.412372"
|
109 |
-
},
|
110 |
-
{
|
111 |
-
"id": "d38e65ce-75b5-4dbf-8ade-bff6a5803790",
|
112 |
-
"model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
|
113 |
-
"subset": "timit-test",
|
114 |
-
"submission_name": "fine-tuned version of facebook/wav2vec2-xls-r-300m on the Timit dataset",
|
115 |
-
"github_url": "https://www.kaggle.com/code/vitouphy/phoneme-recognition-with-wav2vec2",
|
116 |
-
"status": "completed",
|
117 |
-
"submitted_at": "2024-12-18T22:19:46.817373"
|
118 |
-
},
|
119 |
-
{
|
120 |
-
"id": "2839c0c6-8f3b-426e-9eb7-04b6e133dc47",
|
121 |
-
"model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa-plus-2000",
|
122 |
-
"subset": "timit-test",
|
123 |
-
"submission_name": "wav2vec2 model",
|
124 |
-
"github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
|
125 |
-
"status": "completed",
|
126 |
-
"submitted_at": "2024-12-18T22:55:36.734691"
|
127 |
-
},
|
128 |
-
{
|
129 |
-
"id": "59afc37a-0072-44dd-a02a-0cf47d89c120",
|
130 |
-
"model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
|
131 |
-
"subset": "timit-test",
|
132 |
-
"submission_name": "wav2vec2 non-english transcription",
|
133 |
-
"github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
|
134 |
-
"status": "completed",
|
135 |
-
"submitted_at": "2024-12-18T23:47:03.488337"
|
136 |
-
},
|
137 |
-
{
|
138 |
-
"id": "e57eda9d-7a1d-4b41-9d47-a3d3839cac8b",
|
139 |
-
"model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
|
140 |
-
"subset": "timit-test",
|
141 |
-
"submission_name": "phonetic transcription with the Buckeye corpus, from xlsr-53 model ",
|
142 |
-
"github_url": "https://github.com/ginic/multipa/tree/buckeye_experiments",
|
143 |
-
"status": "failed",
|
144 |
-
"submitted_at": "2024-12-19T11:48:26.415322",
|
145 |
-
"error": "Evaluation failed: (MaxRetryError(\"HTTPSConnectionPool(host='cdn-lfs-us-1.hf.co', port=443): Max retries exceeded with url: /repos/a4/b1/a4b11f4627350048e021a84d10b89320db54e02c54b2a9366228f8a05cda220b/120f5bc04d1df15143033c93e3ef358981775b529f17e0db11e58a1b80754e67?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&Expires=1734889736&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczNDg4OTczNn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zL2E0L2IxL2E0YjExZjQ2MjczNTAwNDhlMDIxYTg0ZDEwYjg5MzIwZGI1NGUwMmM1NGIyYTkzNjYyMjhmOGEwNWNkYTIyMGIvMTIwZjViYzA0ZDFkZjE1MTQzMDMzYzkzZTNlZjM1ODk4MTc3NWI1MjlmMTdlMGRiMTFlNThhMWI4MDc1NGU2Nz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=kfPD6ymEJuVvFZyuN3qL3xk4YJlpI5dqHgON4wJY-Mppwlp6x4Dw7cWdjEkJvMRF-bDuzNWQ3BEJPbsYouVW9WZMucDmxo38UwxSzIBhfWQxCYiHdUWuQPkypDUkI1mR3vbnCFQFXLiMQ2CgwWQz7q66OjIyq3suA00mhL2WcL8wvtovrfoEOkboEXCHCNLprfpoHpfoyfo~VS9~kmm61GN6SWbc9lzASIuT5FLkn~BJ6h405MgutQpNvrR4SHVLftk7rBmY8TAB3re5D0-9qFrMYb2Tk~9RKT3nxSNbgZVcEXzA5rYskcuGsrHoTuTTZ-NSW69K2M0IeivzFWTLNQ__&Key-Pair-Id=K24J24Z295AEI9 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x280544190>: Failed to establish a new connection: [Errno 51] Network is unreachable'))\"), '(Request ID: 14c9cc7c-47ee-47ae-b473-f4add807d233)')"
|
146 |
-
},
|
147 |
-
{
|
148 |
-
"id": "5517f6b2-6a76-4a2d-a6ce-33446f390c3b",
|
149 |
-
"model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
|
150 |
-
"subset": "timit-test",
|
151 |
-
"submission_name": "phonetic transcription with the Buckeye corpus, from xlsr-53 model",
|
152 |
-
"github_url": "https://github.com/ginic/multipa/tree/buckeye_experiments",
|
153 |
-
"status": "completed",
|
154 |
-
"submitted_at": "2024-12-20T13:29:37.327317"
|
155 |
-
},
|
156 |
-
{
|
157 |
-
"id": "c2139f96-e79e-4f25-a525-aa039f65555f",
|
158 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.9.2WithoutSpaces",
|
159 |
-
"subset": "timit-test",
|
160 |
-
"submission_name": "phonetic transcription",
|
161 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces",
|
162 |
-
"status": "completed",
|
163 |
-
"submitted_at": "2024-12-20T14:01:35.626112"
|
164 |
-
},
|
165 |
-
{
|
166 |
-
"id": "d146f1f1-6e6e-4b28-9420-c652ae9a1002",
|
167 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
|
168 |
-
"subset": "timit-test",
|
169 |
-
"submission_name": "Jubliano xlsr model",
|
170 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-nl1.1",
|
171 |
-
"status": "completed",
|
172 |
-
"submitted_at": "2024-12-20T15:08:45.949389"
|
173 |
-
},
|
174 |
-
{
|
175 |
-
"id": "265c5859-e7ba-492d-a6c9-45733dc17c99",
|
176 |
-
"model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
|
177 |
-
"subset": "timit-test",
|
178 |
-
"submission_name": "Jubliano xlsr model",
|
179 |
-
"github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-nl1.1",
|
180 |
-
"status": "completed",
|
181 |
-
"submitted_at": "2024-12-20T15:26:27.706187"
|
182 |
-
},
|
183 |
-
{
|
184 |
-
"id": "e297dfde-95e5-462b-a6e5-8fa43bc30bc0",
|
185 |
-
"model": "speech31/wavlm-large-english-ipa",
|
186 |
-
"subset": "timit-test",
|
187 |
-
"submission_name": "speech31 phoneme transcription english",
|
188 |
-
"github_url": "https://huggingface.co/speech31/wavlm-large-english-ipa",
|
189 |
-
"status": "completed",
|
190 |
-
"submitted_at": "2024-12-20T15:56:25.445806"
|
191 |
-
},
|
192 |
-
{
|
193 |
-
"id": "efe95f71-05e3-485d-8e0c-1823a3037cf4",
|
194 |
-
"model": "speech31/wavlm-large-english-ipa",
|
195 |
-
"subset": "timit-test",
|
196 |
-
"submission_name": "speech31 phoneme transcription english",
|
197 |
-
"github_url": "https://huggingface.co/speech31/wavlm-large-english-ipa",
|
198 |
-
"status": "completed",
|
199 |
-
"submitted_at": "2024-12-20T16:13:24.099308"
|
200 |
-
},
|
201 |
-
{
|
202 |
-
"id": "4b2ae2fc-fe2f-4f8b-9e8f-25c0bae13c0d",
|
203 |
-
"model": "speech31/XLS-R-300m-english-ipa",
|
204 |
-
"subset": "timit-test",
|
205 |
-
"submission_name": "speech31 xlsr model",
|
206 |
-
"github_url": "https://huggingface.co/speech31/XLS-R-300m-english-ipa",
|
207 |
-
"status": "completed",
|
208 |
-
"submitted_at": "2024-12-20T16:33:23.864360"
|
209 |
-
},
|
210 |
-
{
|
211 |
-
"id": "33d387c0-703c-415d-b8e2-81cea87a2146",
|
212 |
-
"model": "speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
|
213 |
-
"subset": "timit-test",
|
214 |
-
"submission_name": "model is a fine-tuned version of facebook/wav2vec2-large on the TIMIT dataset",
|
215 |
-
"github_url": "https://huggingface.co/speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
|
216 |
-
"status": "completed",
|
217 |
-
"submitted_at": "2024-12-20T16:52:07.883839"
|
218 |
-
},
|
219 |
-
{
|
220 |
-
"id": "c89bcefc-3884-435a-a54c-24297fe6f041",
|
221 |
-
"model": "speech31/wav2vec2-large-TIMIT-IPA2",
|
222 |
-
"subset": "timit-test",
|
223 |
-
"submission_name": "fine-tuned version of facebook/wav2vec2-large on the None dataset",
|
224 |
-
"github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
|
225 |
-
"status": "completed",
|
226 |
-
"submitted_at": "2024-12-20T21:54:38.559569"
|
227 |
-
},
|
228 |
-
{
|
229 |
-
"id": "81fa94f8-94ae-4601-952c-24abaddaf691",
|
230 |
-
"model": "ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
|
231 |
-
"subset": "timit-test",
|
232 |
-
"submission_name": "ginic model, facebook/wav2vec2-large-xlsr-53 fine tuned",
|
233 |
-
"github_url": "https://huggingface.co/ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
|
234 |
-
"status": "completed",
|
235 |
-
"submitted_at": "2024-12-21T01:15:41.870875"
|
236 |
-
}
|
237 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app/tasks.py
CHANGED
@@ -1,224 +1,117 @@
|
|
1 |
-
# This modules handles the task queue
|
2 |
|
3 |
-
import
|
4 |
-
import
|
|
|
5 |
from datetime import datetime
|
6 |
-
from pathlib import Path
|
7 |
-
from typing import Optional
|
8 |
|
9 |
-
import asyncio
|
10 |
-
import pandas as pd
|
11 |
|
12 |
-
from
|
|
|
|
|
|
|
13 |
|
14 |
-
|
15 |
-
CURRENT_DIR = Path(__file__).parent.absolute()
|
16 |
|
17 |
-
# Constants
|
18 |
-
QUEUE_DIR = CURRENT_DIR / "queue"
|
19 |
-
PATHS = {
|
20 |
-
"tasks": QUEUE_DIR / "tasks.json",
|
21 |
-
"results": QUEUE_DIR / "results.json",
|
22 |
-
"leaderboard": QUEUE_DIR / "leaderboard.json",
|
23 |
-
}
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
-
# Handle storing and loading data from JSON files
|
27 |
-
class StorageManager:
|
28 |
-
"""Handles all JSON storage operations"""
|
29 |
|
30 |
-
|
31 |
-
self.paths = paths
|
32 |
-
self._ensure_directories()
|
33 |
|
34 |
-
def _ensure_directories(self):
|
35 |
-
"""Ensure all necessary directories and files exist"""
|
36 |
-
for path in self.paths.values():
|
37 |
-
path.parent.mkdir(parents=True, exist_ok=True)
|
38 |
-
if not path.exists():
|
39 |
-
path.write_text("[]")
|
40 |
-
|
41 |
-
def load(self, key: str) -> list:
|
42 |
-
"""Load JSON file"""
|
43 |
-
return json.loads(self.paths[key].read_text())
|
44 |
-
|
45 |
-
def save(self, key: str, data: list):
|
46 |
-
"""Save data to JSON file"""
|
47 |
-
self.paths[key].write_text(
|
48 |
-
json.dumps(data, indent=4, default=str, ensure_ascii=False)
|
49 |
-
)
|
50 |
-
|
51 |
-
def update_task(self, task_id: str, updates: dict):
|
52 |
-
"""Update specific task with new data"""
|
53 |
-
tasks = self.load("tasks")
|
54 |
-
for task in tasks:
|
55 |
-
if task["id"] == task_id:
|
56 |
-
task.update(updates)
|
57 |
-
break
|
58 |
-
self.save("tasks", tasks)
|
59 |
-
|
60 |
-
|
61 |
-
# Initialize storage manager
|
62 |
-
storage_manager = StorageManager(PATHS)
|
63 |
|
|
|
|
|
64 |
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
try:
|
69 |
-
return pd.DataFrame(storage_manager.load("leaderboard"))
|
70 |
-
except Exception as e:
|
71 |
-
print(f"Error loading leaderboard: {e}")
|
72 |
-
return pd.DataFrame()
|
73 |
|
|
|
|
|
|
|
74 |
|
75 |
-
|
76 |
-
"""Return list of evaluation results"""
|
77 |
-
return storage_manager.load("results")
|
78 |
|
79 |
|
80 |
-
def
|
81 |
-
"""
|
82 |
-
return storage_manager.load("tasks")
|
83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
|
|
|
|
89 |
|
90 |
-
|
91 |
-
results = get_results()
|
92 |
-
tasks = get_tasks()
|
93 |
-
|
94 |
-
# First try to find by task ID
|
95 |
-
result = next((r for r in results if r["task_id"] == query), None)
|
96 |
-
task = next((t for t in tasks if t["id"] == query), None)
|
97 |
-
|
98 |
-
# If not found, try to find by model name
|
99 |
-
if not result:
|
100 |
-
result = next((r for r in results if r["model"] == query), None)
|
101 |
-
if not task:
|
102 |
-
task = next((t for t in tasks if t["model"] == query), None)
|
103 |
-
|
104 |
-
if result:
|
105 |
-
# If we found results, return them
|
106 |
-
return {
|
107 |
-
"status": "completed",
|
108 |
-
"model": result["model"],
|
109 |
-
"subset": result["subset"],
|
110 |
-
"num_files": result["num_files"],
|
111 |
-
"average_per": result["average_per"],
|
112 |
-
"average_pwed": result["average_pwed"],
|
113 |
-
"detailed_results": result["detailed_results"],
|
114 |
-
"timestamp": result["timestamp"],
|
115 |
-
}
|
116 |
-
elif task:
|
117 |
-
# If we only found task status, return that
|
118 |
-
return task
|
119 |
-
else:
|
120 |
-
return {"error": f"No results found for '{query}'"}
|
121 |
|
122 |
-
except Exception as e:
|
123 |
-
print(f"Error checking status: {e}")
|
124 |
-
return {"error": f"Error checking status: {str(e)}"}
|
125 |
|
|
|
126 |
|
127 |
-
def start_eval_task(
|
128 |
-
model_name: str, submission_name: str, github_url: Optional[str] = None
|
129 |
-
) -> str:
|
130 |
-
"""Start evaluation task in background. Returns task ID that can be used to check status."""
|
131 |
|
132 |
-
|
133 |
-
task_id = str(uuid.uuid4())
|
134 |
-
|
135 |
-
# Create task entry
|
136 |
-
task = {
|
137 |
-
"id": task_id,
|
138 |
-
"model": model_name,
|
139 |
-
"subset": "test",
|
140 |
-
"submission_name": submission_name,
|
141 |
-
"github_url": github_url,
|
142 |
-
"status": "queued",
|
143 |
-
"submitted_at": datetime.now().isoformat(),
|
144 |
-
}
|
145 |
-
|
146 |
-
# Save task
|
147 |
-
tasks = storage_manager.load("tasks")
|
148 |
-
tasks.append(task)
|
149 |
-
storage_manager.save("tasks", tasks)
|
150 |
-
|
151 |
-
# Start evaluation in background
|
152 |
-
asyncio.run(_eval_task(task_id, model_name, submission_name, "test", github_url))
|
153 |
-
|
154 |
-
return task_id
|
155 |
-
|
156 |
-
|
157 |
-
async def _eval_task(
|
158 |
-
task_id: str,
|
159 |
-
model_name: str,
|
160 |
-
submission_name: str,
|
161 |
-
subset: str = "test",
|
162 |
-
github_url: Optional[str] = None,
|
163 |
-
max_samples: Optional[int] = None,
|
164 |
-
):
|
165 |
"""Background task to evaluate model and save updated results"""
|
166 |
try:
|
167 |
# Indicate task is processing
|
168 |
-
|
169 |
|
170 |
# Evaluate model
|
171 |
-
|
172 |
-
|
173 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
174 |
|
175 |
# Save results
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
)
|
188 |
-
|
189 |
-
if entry:
|
190 |
-
# Simply update with new scores
|
191 |
-
entry.update(
|
192 |
-
{
|
193 |
-
"task_id": task_id,
|
194 |
-
"average_per": avg_per,
|
195 |
-
"average_pwed": avg_pwed,
|
196 |
-
"model": model_name,
|
197 |
-
"subset": subset,
|
198 |
-
"github_url": github_url,
|
199 |
-
"submission_date": datetime.now().isoformat(),
|
200 |
-
}
|
201 |
)
|
202 |
-
else:
|
203 |
-
leaderboard.append(
|
204 |
-
{
|
205 |
-
"task_id": task_id,
|
206 |
-
"submission_id": str(uuid.uuid4()),
|
207 |
-
"submission_name": submission_name,
|
208 |
-
"model": model_name,
|
209 |
-
"average_per": avg_per,
|
210 |
-
"average_pwed": avg_pwed,
|
211 |
-
"subset": subset,
|
212 |
-
"github_url": github_url,
|
213 |
-
"submission_date": datetime.now().isoformat(),
|
214 |
-
}
|
215 |
-
)
|
216 |
-
|
217 |
-
storage_manager.save("leaderboard", leaderboard)
|
218 |
-
storage_manager.update_task(task_id, {"status": "completed"})
|
219 |
-
print("Evaluation completed successfully")
|
220 |
|
|
|
|
|
221 |
except Exception as e:
|
222 |
-
|
223 |
-
|
224 |
-
storage_manager.update_task(task_id, {"status": "failed", "error": error_msg})
|
|
|
1 |
+
# This modules handles the task queue
|
2 |
|
3 |
+
import os
|
4 |
+
import multiprocessing
|
5 |
+
from typing import TypedDict
|
6 |
from datetime import datetime
|
|
|
|
|
7 |
|
|
|
|
|
8 |
|
9 |
+
from metrics import per, fer
|
10 |
+
from datasets import load_from_disk
|
11 |
+
from hf import get_repo_info, add_leaderboard_entry
|
12 |
+
from inference import clear_cache, load_model, transcribe
|
13 |
|
14 |
+
leaderboard_lock = multiprocessing.Lock()
|
|
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
+
class Task(TypedDict):
|
18 |
+
status: str
|
19 |
+
display_name: str
|
20 |
+
repo_id: str
|
21 |
+
repo_hash: str
|
22 |
+
repo_last_modified: datetime
|
23 |
+
submission_timestamp: datetime
|
24 |
+
url: str
|
25 |
+
error: str | None
|
26 |
|
|
|
|
|
|
|
27 |
|
28 |
+
tasks: list[Task] = []
|
|
|
|
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
+
def get_status(query: str) -> dict:
|
32 |
+
"""Check status of an evaluation task by repo_id or repo_hash"""
|
33 |
|
34 |
+
query = query.strip().lower()
|
35 |
+
if not query:
|
36 |
+
return {"error": "Please enter a model id or task id"}
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
+
for task in reversed(tasks):
|
39 |
+
if task["repo_id"].lower() == query or task["repo_hash"].lower() == query:
|
40 |
+
return dict(task)
|
41 |
|
42 |
+
return {"error": f"No results found for '{query}'"}
|
|
|
|
|
43 |
|
44 |
|
45 |
+
def start_eval_task(display_name: str, repo_id: str, url: str) -> str:
|
46 |
+
"""Start evaluation task in background. Returns task ID that can be used to check status."""
|
|
|
47 |
|
48 |
+
repo_hash, last_modified = get_repo_info(repo_id)
|
49 |
+
# TODO: check if hash is different from the most recent submission if any for repo_id, otherwise don't recompute
|
50 |
+
task = Task(
|
51 |
+
status="submitted",
|
52 |
+
display_name=display_name,
|
53 |
+
repo_id=repo_id,
|
54 |
+
repo_hash=repo_hash,
|
55 |
+
repo_last_modified=last_modified,
|
56 |
+
submission_timestamp=datetime.now(),
|
57 |
+
url=url,
|
58 |
+
error=None,
|
59 |
+
)
|
60 |
|
61 |
+
manager = multiprocessing.Manager()
|
62 |
+
task_proxy = manager.dict(task)
|
63 |
+
tasks.append(task_proxy) # type: ignore
|
64 |
+
multiprocessing.Process(
|
65 |
+
target=_eval_task, args=[task_proxy, leaderboard_lock]
|
66 |
+
).start()
|
67 |
|
68 |
+
return repo_hash
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
|
|
|
|
|
|
|
70 |
|
71 |
+
test_ds = load_from_disk(os.path.join(os.path.dirname(__file__), "data", "test"))
|
72 |
|
|
|
|
|
|
|
|
|
73 |
|
74 |
+
def _eval_task(task: Task, leaderboard_lock):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
"""Background task to evaluate model and save updated results"""
|
76 |
try:
|
77 |
# Indicate task is processing
|
78 |
+
task["status"] = "evaluating"
|
79 |
|
80 |
# Evaluate model
|
81 |
+
average_per = 0
|
82 |
+
average_fer = 0
|
83 |
+
per_dataset_fers = {}
|
84 |
+
|
85 |
+
clear_cache()
|
86 |
+
model, processor = load_model(task["repo_id"])
|
87 |
+
for row in test_ds:
|
88 |
+
transcript = transcribe(row["audio"]["array"], model, processor) # type: ignore
|
89 |
+
row_per = per(transcript, row["ipa"]) # type: ignore
|
90 |
+
row_fer = fer(transcript, row["ipa"]) # type: ignore
|
91 |
+
average_per += row_per
|
92 |
+
average_fer += row_fer
|
93 |
+
per_dataset_fers[row["dataset"]] = per_dataset_fers.get(row["dataset"], 0) + row_fer # type: ignore
|
94 |
+
for key in per_dataset_fers.keys():
|
95 |
+
per_dataset_fers[key] /= len(test_ds.filter(lambda r: r["dataset"] == key))
|
96 |
+
average_per /= len(test_ds)
|
97 |
+
average_fer /= len(test_ds)
|
98 |
|
99 |
# Save results
|
100 |
+
with leaderboard_lock:
|
101 |
+
add_leaderboard_entry(
|
102 |
+
display_name=task["display_name"],
|
103 |
+
repo_id=task["repo_id"],
|
104 |
+
repo_hash=task["repo_hash"],
|
105 |
+
repo_last_modified=task["repo_last_modified"],
|
106 |
+
submission_timestamp=task["submission_timestamp"],
|
107 |
+
average_per=average_per,
|
108 |
+
average_fer=average_fer,
|
109 |
+
url=task["url"],
|
110 |
+
per_dataset_fers=per_dataset_fers,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
+
# Mark task as complete
|
114 |
+
task["status"] = "completed"
|
115 |
except Exception as e:
|
116 |
+
task["status"] = "failed"
|
117 |
+
task["error"] = str(e)
|
|
requirements.txt
CHANGED
@@ -1,11 +1,17 @@
|
|
1 |
-
#
|
2 |
-
|
3 |
-
|
4 |
-
transformers==4.44.2
|
5 |
-
huggingface_hub==0.25.1
|
6 |
-
gradio==5.12.0
|
7 |
-
panphon==0.21.2
|
8 |
|
9 |
# Data processing
|
10 |
pandas==2.0.3
|
11 |
numpy==1.25.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Huggingface
|
2 |
+
huggingface_hub==0.34.4
|
3 |
+
datasets==4.0.0
|
|
|
|
|
|
|
|
|
4 |
|
5 |
# Data processing
|
6 |
pandas==2.0.3
|
7 |
numpy==1.25.2
|
8 |
+
panphon==0.21.2
|
9 |
+
torch==2.8.0
|
10 |
+
torchaudio==2.8.0
|
11 |
+
torchcodec==0.6.0
|
12 |
+
transformers==4.56.0
|
13 |
+
phonemizer==3.3.0
|
14 |
+
|
15 |
+
# UI
|
16 |
+
gradio==5.12.0
|
17 |
+
protobuf==6.32.0
|
requirements_lock.txt
CHANGED
@@ -1,28 +1,100 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
idna==3.10
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
tqdm==4.67.1
|
25 |
-
transformers==4.
|
26 |
-
|
27 |
-
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
aiofiles==23.2.1
|
2 |
+
aiohappyeyeballs==2.6.1
|
3 |
+
aiohttp==3.12.15
|
4 |
+
aiosignal==1.4.0
|
5 |
+
annotated-types==0.7.0
|
6 |
+
anyio==4.10.0
|
7 |
+
async-timeout==5.0.1
|
8 |
+
attrs==25.3.0
|
9 |
+
babel==2.17.0
|
10 |
+
certifi==2025.8.3
|
11 |
+
charset-normalizer==3.4.3
|
12 |
+
click==8.2.1
|
13 |
+
colorama==0.4.6
|
14 |
+
csvw==3.5.1
|
15 |
+
datasets==4.0.0
|
16 |
+
dill==0.3.8
|
17 |
+
dlinfo==2.0.0
|
18 |
+
editdistance==0.8.1
|
19 |
+
exceptiongroup==1.3.0
|
20 |
+
fastapi==0.116.1
|
21 |
+
ffmpy==0.6.1
|
22 |
+
filelock==3.19.1
|
23 |
+
frozenlist==1.7.0
|
24 |
+
fsspec==2025.3.0
|
25 |
+
gradio==5.12.0
|
26 |
+
gradio_client==1.5.4
|
27 |
+
h11==0.16.0
|
28 |
+
hf-xet==1.1.9
|
29 |
+
httpcore==1.0.9
|
30 |
+
httpx==0.28.1
|
31 |
+
huggingface-hub==0.34.4
|
32 |
idna==3.10
|
33 |
+
isodate==0.7.2
|
34 |
+
Jinja2==3.1.6
|
35 |
+
joblib==1.5.2
|
36 |
+
jsonschema==4.25.1
|
37 |
+
jsonschema-specifications==2025.4.1
|
38 |
+
language-tags==1.2.0
|
39 |
+
markdown-it-py==4.0.0
|
40 |
+
MarkupSafe==2.1.5
|
41 |
+
mdurl==0.1.2
|
42 |
+
mpmath==1.3.0
|
43 |
+
multidict==6.6.4
|
44 |
+
multiprocess==0.70.16
|
45 |
+
munkres==1.1.4
|
46 |
+
networkx==3.4.2
|
47 |
+
numpy==1.25.2
|
48 |
+
orjson==3.11.3
|
49 |
+
packaging==25.0
|
50 |
+
pandas==2.0.3
|
51 |
+
panphon==0.21.2
|
52 |
+
phonemizer==3.3.0
|
53 |
+
pillow==11.3.0
|
54 |
+
propcache==0.3.2
|
55 |
+
protobuf==6.32.0
|
56 |
+
pyarrow==21.0.0
|
57 |
+
pydantic==2.11.7
|
58 |
+
pydantic_core==2.33.2
|
59 |
+
pydub==0.25.1
|
60 |
+
Pygments==2.19.2
|
61 |
+
pyparsing==3.2.3
|
62 |
+
python-dateutil==2.9.0.post0
|
63 |
+
python-multipart==0.0.20
|
64 |
+
pytz==2025.2
|
65 |
+
PyYAML==6.0.2
|
66 |
+
rdflib==7.1.4
|
67 |
+
referencing==0.36.2
|
68 |
+
regex==2025.9.1
|
69 |
+
requests==2.32.5
|
70 |
+
rfc3986==1.5.0
|
71 |
+
rich==14.1.0
|
72 |
+
rpds-py==0.27.1
|
73 |
+
ruff==0.12.11
|
74 |
+
safehttpx==0.1.6
|
75 |
+
safetensors==0.6.2
|
76 |
+
segments==2.3.0
|
77 |
+
semantic-version==2.10.0
|
78 |
+
shellingham==1.5.4
|
79 |
+
six==1.17.0
|
80 |
+
sniffio==1.3.1
|
81 |
+
starlette==0.47.3
|
82 |
+
sympy==1.14.0
|
83 |
+
tokenizers==0.22.0
|
84 |
+
tomlkit==0.13.3
|
85 |
+
torch==2.8.0
|
86 |
+
torchaudio==2.8.0
|
87 |
+
torchcodec==0.6.0
|
88 |
tqdm==4.67.1
|
89 |
+
transformers==4.56.0
|
90 |
+
typer==0.17.3
|
91 |
+
typing-inspection==0.4.1
|
92 |
+
typing_extensions==4.15.0
|
93 |
+
tzdata==2025.2
|
94 |
+
unicodecsv==0.14.1
|
95 |
+
uritemplate==4.2.0
|
96 |
+
urllib3==2.5.0
|
97 |
+
uvicorn==0.35.0
|
98 |
+
websockets==14.2
|
99 |
+
xxhash==3.5.0
|
100 |
+
yarl==1.20.1
|
scripts/download_data_curl.sh
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
# install ./.data/TIMIT.zip from https://www.kaggle.com/datasets/mfekadu/darpa-timit-acousticphonetic-continuous-speech?resource=download
|
2 |
-
curl -L -o ./queue/data/TIMIT.zip\
|
3 |
-
https://www.kaggle.com/api/v1/datasets/download/mfekadu/darpa-timit-acousticphonetic-continuous-speech
|
|
|
|
|
|
|
|
scripts/download_data_lfs.sh
DELETED
@@ -1,2 +0,0 @@
|
|
1 |
-
# Download the TIMIT.zip dataset
|
2 |
-
git lfs pull --include="./queue/data/TIMIT.zip"
|
|
|
|
|
|
scripts/install.sh
DELETED
@@ -1,19 +0,0 @@
|
|
1 |
-
# Create a virtual environment with Python 3.10
|
2 |
-
python3.10 -m venv venv
|
3 |
-
|
4 |
-
# Activate the virtual environment
|
5 |
-
. ./venv/bin/activate
|
6 |
-
|
7 |
-
# Install the required dependencies
|
8 |
-
pip install -r requirements_lock.txt
|
9 |
-
|
10 |
-
# Download data
|
11 |
-
# check if git lfs is installed and run the appropriate script, otherwise run the curl script
|
12 |
-
if [ -x "$(command -v git-lfs)" ]; then
|
13 |
-
. ./scripts/download_data_lfs.sh
|
14 |
-
else
|
15 |
-
. ./scripts/download_data_curl.sh
|
16 |
-
fi
|
17 |
-
|
18 |
-
# Deactivate the virtual environment
|
19 |
-
deactivate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scripts/run-dev.sh
CHANGED
@@ -1,8 +1,2 @@
|
|
1 |
-
# Activate the virtual environment
|
2 |
-
. ./venv/bin/activate
|
3 |
-
|
4 |
# Run the app with auto-reload enabled
|
5 |
gradio app/app.py
|
6 |
-
|
7 |
-
# Deactivate the virtual environment
|
8 |
-
deactivate
|
|
|
|
|
|
|
|
|
1 |
# Run the app with auto-reload enabled
|
2 |
gradio app/app.py
|
|
|
|
|
|
scripts/run-prod.sh
CHANGED
@@ -1,8 +1,2 @@
|
|
1 |
-
# Activate the virtual environment
|
2 |
-
. ./venv/bin/activate
|
3 |
-
|
4 |
# Run the app without auto-reload
|
5 |
python app/app.py
|
6 |
-
|
7 |
-
# Deactivate the virtual environment
|
8 |
-
deactivate
|
|
|
|
|
|
|
|
|
1 |
# Run the app without auto-reload
|
2 |
python app/app.py
|
|
|
|
|
|
scripts/sample_test_set.py
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
|
3 |
+
import os
|
4 |
+
from datasets import load_dataset, concatenate_datasets, Dataset
|
5 |
+
|
6 |
+
SEED = 42
|
7 |
+
SAMPLE_SIZE = 100
|
8 |
+
|
9 |
+
testsets: list[tuple[str, Dataset]] = [
|
10 |
+
("TIMIT", load_dataset("KoelLabs/TIMIT")["test"]),
|
11 |
+
("EpaDB", load_dataset("KoelLabs/EpaDB")["test"]),
|
12 |
+
("PSST", load_dataset("KoelLabs/PSST")["test"]),
|
13 |
+
("SpeechOcean", load_dataset("KoelLabs/SpeechOceanNoTH")["test"]),
|
14 |
+
("ISLE", load_dataset("KoelLabs/ISLE")["train"]),
|
15 |
+
] # type: ignore
|
16 |
+
|
17 |
+
all_datasets = []
|
18 |
+
for name, test_ds in testsets:
|
19 |
+
shuffled_ds = test_ds.shuffle(seed=SEED)
|
20 |
+
sample_ds = shuffled_ds.select(range(SAMPLE_SIZE))
|
21 |
+
sample_ds = sample_ds.add_column("dataset", [name] * len(sample_ds)) # type: ignore
|
22 |
+
sample_ds = sample_ds.remove_columns(
|
23 |
+
[
|
24 |
+
col
|
25 |
+
for col in sample_ds.column_names
|
26 |
+
if col not in ["audio", "ipa", "dataset"]
|
27 |
+
]
|
28 |
+
)
|
29 |
+
all_datasets.append(sample_ds)
|
30 |
+
combined_ds: Dataset = concatenate_datasets(all_datasets)
|
31 |
+
|
32 |
+
os.makedirs(os.path.join("app", "data"), exist_ok=True)
|
33 |
+
combined_ds.save_to_disk(os.path.join("app", "data", "test"))
|