SanderGi commited on
Commit
c2e60bb
·
1 Parent(s): 007a01f

fix and make functional, add more datasets

Browse files
CONTRIBUTING.md CHANGED
@@ -1,7 +1,7 @@
1
  # Contributing to Koel Labs - IPA Transcription EN
2
  👍🎉 First off, thanks for taking the time to contribute! 🎉👍
3
 
4
- These are the specific contributing guidelines for the English IPA transcription leaderboard. Checkout our [general contributing guidelines here](https://github.com/KoelLabs/.github/blob/main/CONTRIBUTING.md).
5
 
6
  ## Where to Start
7
 
 
1
  # Contributing to Koel Labs - IPA Transcription EN
2
  👍🎉 First off, thanks for taking the time to contribute! 🎉👍
3
 
4
+ These are the specific contributing guidelines for the English IPA transcription leaderboard. Check out our [general contributing guidelines here](https://github.com/KoelLabs/.github/blob/main/CONTRIBUTING.md).
5
 
6
  ## Where to Start
7
 
DEVELOPMENT.md CHANGED
@@ -2,47 +2,69 @@
2
 
3
  ## Design Decisions
4
 
5
- We specifically opt for a single-space leaderboard for simplicity. We solve the issue of keeping the gradio UI interactive while models are evaluating by using background tasks instead of a separate space.
6
 
7
- ## Setup
8
 
9
  ### Prerequisites
10
 
11
- * Python 3.10
12
- * Git
13
  * A love for speech recognition! 🎤
14
 
15
  ### Quick Installation
16
 
17
- 1. Clone this repository:
18
  ```bash
19
- GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN
20
- cd IPA-Transcription-EN
21
  ```
22
 
23
- 2. Set up your environment and download data:
24
  ```bash
25
- . ./scripts/install.sh
26
  ```
27
 
28
- 3. Launch the leaderboard in development mode (auto-reloads on code changes):
29
  ```bash
30
- . ./scripts/run-dev.sh
 
 
 
 
 
 
 
 
 
 
 
31
  ```
32
 
 
 
 
 
 
 
33
  4. Visit `http://localhost:7860` in your browser and see the magic! ✨
34
 
35
- ## Adding/Removing Dependencies
 
 
 
 
36
  0. Activate the virtual environment with `. ./venv/bin/activate`
37
  1. Add the dependency to `requirements.txt` (or remove it)
38
- 2. Make sure you have no unused dependencies with `pipx run deptry .`
39
  3. Run `pip install -r requirements.txt`
40
  4. Freeze the dependencies with `pip freeze > requirements_lock.txt`
41
 
42
- ## Run without reloading
43
- ```bash
44
- . ./scripts/run-prod.sh
45
- ```
 
 
46
 
47
  ## File Structure
48
 
@@ -56,19 +78,16 @@ IPA-Transcription-EN/
56
  ├── requirements.txt # Python dependencies
57
  ├── requirements_lock.txt # Locked dependencies
58
  ├── scripts # Helper scripts
59
- │ ├── install.sh # Install dependencies and download data
 
60
  │ └── run-dev.sh # Run the leaderboard in development mode
61
  ├── venv # Virtual environment
62
  ├── app/ # All application code lives here
63
- │ ├── data/ # Phoneme transcription datasets
64
- │ ├── queue/ # Stores leaderboard state and task status
65
- │ | ├── tasks.json # Task queue
66
- │ | ├── results.json # Detailed evaluation results
67
- │ | └── leaderboard.json # Compact results for leaderboard display
68
  │ ├── app.py # Main Gradio UI
69
- │ ├── tasks.py # Background tasks for model evaluation
70
- │ ├── data.py # Data loading and processing
71
  │ ├── inference.py # Model inference
72
- │ └── phone_metrics.py # Evaluation metrics
 
73
  └── img/ # Images for README and other documentation
74
  ```
 
2
 
3
  ## Design Decisions
4
 
5
+ We specifically opt for a single-space leaderboard for simplicity. We solve the issue of keeping the gradio UI interactive while models are evaluating by using multiprocessing instead of a separate space. Leaderboard entries are persisted in a Huggingface Dataset to avoid paying for persistent storage. Tasks are deliberately ephemeral.
6
 
7
+ ## Local Setup
8
 
9
  ### Prerequisites
10
 
11
+ * [Python 3.10](https://www.python.org/downloads/release/python-31017/)
12
+ * [Git](https://git-scm.com/downloads)
13
  * A love for speech recognition! 🎤
14
 
15
  ### Quick Installation
16
 
17
+ 0. Make sure git-lfs is installed (https://git-lfs.com)
18
  ```bash
19
+ git lfs install
 
20
  ```
21
 
22
+ 1. Clone this repository:
23
  ```bash
24
+ git clone https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN
25
  ```
26
 
27
+ 2. Setup your environment:
28
  ```bash
29
+ # Create a virtual environment with Python 3.10
30
+ python3.10 -m venv venv
31
+
32
+ # Activate the virtual environment
33
+ . ./venv/bin/activate
34
+ # use `deactivate` to exit out of it
35
+
36
+ # Install the required dependencies
37
+ pip install -r requirements_lock.txt
38
+
39
+ # Add a HF_TOKEN with access to your backing dataset (in app/hf.py) and any models you want to be able to run
40
+ huggingface-cli login
41
  ```
42
 
43
+ 3. Launch the leaderboard:
44
+ ```bash
45
+ . ./scripts/run-dev.sh # development mode (auto-reloads)
46
+ . ./scripts/run-prod.sh # production mode (no auto-reloads)
47
+ ```
48
+
49
  4. Visit `http://localhost:7860` in your browser and see the magic! ✨
50
 
51
+ ### Adding New Datasets
52
+
53
+ The datasets are pre-processed into a single dataset stored in `app/data/test` with three columns: audio (16 kHz), ipa, and dataset (original source). This is done using the `scripts/sample_test_set.py` file. To add new datasets, add them to this script. Beware that existing leaderboard entries will need to be recalculated. You can do this locally by accessing the dataset corresponding to `LEADERBOARD_ID` stored in `app/hf.py`.
54
+
55
+ ### Adding/Removing Dependencies
56
  0. Activate the virtual environment with `. ./venv/bin/activate`
57
  1. Add the dependency to `requirements.txt` (or remove it)
58
+ 2. Make sure you have no unused dependencies with `pipx run deptry .` (if necessary `python -m pip install pipx`)
59
  3. Run `pip install -r requirements.txt`
60
  4. Freeze the dependencies with `pip freeze > requirements_lock.txt`
61
 
62
+ ## Forking Into Your Own Leaderboard
63
+
64
+ 0. Navigate to [the space](https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN), click the three dots on the right and select `Duplicate this Space`
65
+ 1. Modify the `LEADERBOARD_ID` in `app/hf.py` to be some dataset that you own that the new space can use to store data. You don't need to create the dataset but if you do, it should be empty.
66
+ 2. Open the settings in your new space and add a new secret `HF_TOKEN`. You can [create it here](https://huggingface.co/settings/tokens). It just needs read access to all models you want to add to the leaderboard and write access to the private backing dataset specified by `LEADERBOARD_ID`.
67
+ 3. Submit some models and enjoy!
68
 
69
  ## File Structure
70
 
 
78
  ├── requirements.txt # Python dependencies
79
  ├── requirements_lock.txt # Locked dependencies
80
  ├── scripts # Helper scripts
81
+ │ ├── sample_test_set.py # Compute the combined test set
82
+ │ ├── run-prod.sh # Run the leaderboard in production mode
83
  │ └── run-dev.sh # Run the leaderboard in development mode
84
  ├── venv # Virtual environment
85
  ├── app/ # All application code lives here
86
+ │ ├── data/ # Phoneme transcription test set
 
 
 
 
87
  │ ├── app.py # Main Gradio UI
88
+ │ ├── hf.py # Interface with the Huggingface API
 
89
  │ ├── inference.py # Model inference
90
+ │ └── metrics.py # Evaluation metrics
91
+ │ ├── tasks.py # Background tasks for model evaluation
92
  └── img/ # Images for README and other documentation
93
  ```
README.md CHANGED
@@ -13,6 +13,8 @@ thumbnail: >-
13
  short_description: Speech-to-phoneme leaderboard
14
  ---
15
 
 
 
16
  # 🎯 English Phonemic Transcription Leaderboard
17
 
18
  Welcome to the English Phonemic Transcription Leaderboard! This simple leaderboard helps track and compare the performance of different speech-to-phoneme models. Feel free to fork it for your own hugging face leaderboards!
@@ -30,13 +32,12 @@ Welcome to the English Phonemic Transcription Leaderboard! This simple leaderboa
30
 
31
  This leaderboard tracks two key metrics for phonemic transcription models:
32
 
33
-
34
  * **PER (Phoneme Error Rate)**: How accurately your model converts speech to phonemes
35
- * **PWED (Phoneme Weighted Edit Distance)**: A more nuanced metric that considers phonemic features
36
 
37
  Read more about evaluations on our [blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
38
 
39
- Models are evaluated on the TIMIT speech corpus, a gold standard in speech recognition research.
40
 
41
  ## 🚀 Getting Started
42
 
@@ -48,7 +49,7 @@ Navigate to the hosted version on [Hugging Face](https://huggingface.co/spaces/K
48
 
49
  1. Go to the "Submit Model" tab
50
  2. Enter your model details:
51
- * Model name (e.g., "wav2vec2-phoneme-wizard")
52
  * Submission name (e.g., "MyAwesomeModel v1.0")
53
  * GitHub/Kaggle/HuggingFace URL (optional)
54
  3. Click Submit and watch your model climb the ranks! 🚀
@@ -56,7 +57,7 @@ Navigate to the hosted version on [Hugging Face](https://huggingface.co/spaces/K
56
  ### Checking Model Status
57
 
58
  1. Navigate to the "Model Status" tab
59
- 2. Enter your model name or task ID
60
  3. Get real-time updates on your model's evaluation progress
61
 
62
  ## 📊 Understanding the Results
@@ -64,7 +65,7 @@ Navigate to the hosted version on [Hugging Face](https://huggingface.co/spaces/K
64
  The leaderboard shows:
65
 
66
  * Model names and submission details
67
- * PER and PWED scores (lower is better!)
68
  * Links to model repositories
69
  * Submission dates
70
 
@@ -86,7 +87,7 @@ Want to make this leaderboard even better? We'd love your help! Here are some wa
86
  * Submit bug fixes
87
  * Add new features
88
 
89
- Checkout the [CONTRIBUTING.md](CONTRIBUTING.md) for more details.
90
 
91
  ## 📝 License
92
 
@@ -94,12 +95,6 @@ This project is licensed under the GNU Affero General Public License.
94
 
95
  We retain all rights to the Koel Labs brand, logos, blog posts and website content.
96
 
97
- ## 🌟 Acknowledgments
98
-
99
- * Thanks to the TIMIT speech corpus for providing evaluation data
100
- * Shoutout to the [panphon library](https://github.com/dmort27/panphon) for PWED calculations
101
- * Built with love by Koel Labs 💙
102
-
103
  ## 🆘 Need Help?
104
 
105
  Got questions? Found a bug? Want to contribute? [Open an issue](https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN/discussions) or [reach out to us](mailto:[email protected])! We're here to help make speech recognition evaluation fun and accessible for everyone!
@@ -108,4 +103,4 @@ Remember: Every great model deserves its moment to shine! 🌟
108
 
109
  ---
110
 
111
- Happy Transcribing! 🎤✨
 
13
  short_description: Speech-to-phoneme leaderboard
14
  ---
15
 
16
+ ![Koel Labs logo](img/logo-white.png)
17
+
18
  # 🎯 English Phonemic Transcription Leaderboard
19
 
20
  Welcome to the English Phonemic Transcription Leaderboard! This simple leaderboard helps track and compare the performance of different speech-to-phoneme models. Feel free to fork it for your own hugging face leaderboards!
 
32
 
33
  This leaderboard tracks two key metrics for phonemic transcription models:
34
 
 
35
  * **PER (Phoneme Error Rate)**: How accurately your model converts speech to phonemes
36
+ * **FER (Feature Error Rate)**: A more nuanced metric that considers phonemic features
37
 
38
  Read more about evaluations on our [blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
39
 
40
+ Models are evaluated on a variety of English speech: native, non-native, and impaired.
41
 
42
  ## 🚀 Getting Started
43
 
 
49
 
50
  1. Go to the "Submit Model" tab
51
  2. Enter your model details:
52
+ * Model ID (e.g., "my-name/wav2vec2-phoneme-wizard")
53
  * Submission name (e.g., "MyAwesomeModel v1.0")
54
  * GitHub/Kaggle/HuggingFace URL (optional)
55
  3. Click Submit and watch your model climb the ranks! 🚀
 
57
  ### Checking Model Status
58
 
59
  1. Navigate to the "Model Status" tab
60
+ 2. Enter your model ID or task ID
61
  3. Get real-time updates on your model's evaluation progress
62
 
63
  ## 📊 Understanding the Results
 
65
  The leaderboard shows:
66
 
67
  * Model names and submission details
68
+ * PER and FER scores (lower is better!)
69
  * Links to model repositories
70
  * Submission dates
71
 
 
87
  * Submit bug fixes
88
  * Add new features
89
 
90
+ Check out the [CONTRIBUTING.md](CONTRIBUTING.md) for more details.
91
 
92
  ## 📝 License
93
 
 
95
 
96
  We retain all rights to the Koel Labs brand, logos, blog posts and website content.
97
 
 
 
 
 
 
 
98
  ## 🆘 Need Help?
99
 
100
  Got questions? Found a bug? Want to contribute? [Open an issue](https://huggingface.co/spaces/KoelLabs/IPA-Transcription-EN/discussions) or [reach out to us](mailto:[email protected])! We're here to help make speech recognition evaluation fun and accessible for everyone!
 
103
 
104
  ---
105
 
106
+ Happy Transcribing! 🎤✨
app/app.py CHANGED
@@ -1,50 +1,66 @@
1
  # This is the main module that handles rendering the Gradio interface.
2
-
3
- # Note: gradio will automatically create REST API endpoints for the functions that are used as event handlers in the interface.
4
 
5
  import gradio as gr
6
  import pandas as pd
7
 
8
- from tasks import start_eval_task, get_leaderboard_data, get_status
 
9
 
10
 
11
- def get_latest_leaderboard_html(sort_option: str) -> str:
12
  try:
13
  # Get the latest leaderboard data
14
- df = get_leaderboard_data()
 
 
 
 
15
 
16
- # Sort the dataframe so smallest PER or PWED is at the top
17
- sort_column = "average_per" if sort_option.lower() == "per" else "average_pwed"
18
  df = df.sort_values(by=sort_column, ascending=True)
19
 
20
  # Format the dataframe for HTML display
21
  df = pd.DataFrame(
22
  {
23
- "Model": df["model"],
24
- "Average PER ⬇️": df["average_per"].apply(lambda x: f"{x:.4f}"),
25
- "Average PWED ⬇️": df["average_pwed"].apply(lambda x: f"{x:.4f}"),
26
- "Link": df["github_url"].apply(
 
 
 
 
 
 
 
 
 
 
27
  lambda x: (
28
  f'<a href="{x}" target="_blank">Repository</a>' if x else "N/A"
29
  )
30
  ),
31
- "Submission Date": pd.to_datetime(df["submission_date"]).dt.strftime(
32
- "%Y-%m-%d"
33
- ),
34
  }
35
  )
36
  return df.to_html(escape=False, index=False, classes="styled-table")
37
  except Exception as e:
38
- print(f"Error updating leaderboard: {e}")
39
- return "Error updating leaderboard"
40
 
41
 
42
- def submit_evaluation(model_name: str, submission_name: str, github_url: str) -> str:
43
- if not model_name or not submission_name:
 
 
44
  return "⚠️ Please provide both model name and submission name."
45
 
46
  try:
47
- task_id = start_eval_task(model_name, submission_name, github_url)
48
  return f"✅ Evaluation submitted successfully! Task ID: {task_id}"
49
  except Exception as e:
50
  return f"❌ Error: {str(e)}"
@@ -58,7 +74,6 @@ with gr.Blocks(
58
  margin: 25px 0;
59
  font-size: 0.9em;
60
  font-family: sans-serif;
61
- box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
62
  }
63
  .styled-table thead tr {
64
  background: linear-gradient(45deg, #092746, #073562, #0A648F);
@@ -75,22 +90,18 @@ with gr.Blocks(
75
  }
76
  """
77
  ) as demo:
78
- gr.Markdown("# 🎯 English Phonemic Transcription Leaderboard")
79
  gr.Markdown("#### Developed By: [Koel Labs](https://koellabs.com)")
80
  gr.Markdown(
81
  """
82
- ## Explanation of Metrics
 
 
83
  - **PER (Phoneme Error Rate)**: The Levenshtein distance calculated between phoneme sequences of the predicted and actual transcriptions.
84
- - **PWED (Phoneme Weighted Edit Distance)**: Edit distance between the predicted and actual phoneme sequences, weighted by the phonemic feature distance. Method by the [panphon library](https://github.com/dmort27/panphon)
85
 
86
- Read more about evaluations on [our blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
87
- """
88
- )
89
- gr.Markdown(
90
- """
91
- ## Test Set Information
92
- The test set used for evaluation is from the [TIMIT speech corpus](https://www.kaggle.com/datasets/mfekadu/darpa-timit-acousticphonetic-continuous-speech). The TIMIT corpus is a widely used dataset for speech recognition research.
93
-
94
  ## Compute
95
  This leaderboard uses the free basic plan (16GB RAM, 2vCPUs) to allow for reproducability. The evaluation may take several hours to complete. Please be patient and do not submit the same model multiple times.
96
 
@@ -100,38 +111,55 @@ with gr.Blocks(
100
  )
101
  with gr.Tabs():
102
  with gr.TabItem("🏆 Leaderboard"):
 
 
 
 
 
 
 
 
103
  with gr.Row(elem_classes="controls-row"):
104
- # Controls side by side
105
  sort_dropdown = gr.Dropdown(
106
- choices=["PWED", "PER"],
107
- value="PWED",
108
  interactive=True,
109
  scale=2,
110
  container=False, # Removes the box around the dropdown
111
- label=None, # Removes the "Sort by" label
112
  )
113
- refresh_btn = gr.Button("Refresh 🔄", scale=2) # Simplified button text
114
 
115
- leaderboard_html = gr.HTML(get_latest_leaderboard_html(sort_dropdown.value))
 
 
 
 
 
 
 
 
 
 
 
116
  sort_dropdown.change(
117
  fn=get_latest_leaderboard_html,
118
- inputs=[sort_dropdown],
119
  outputs=leaderboard_html,
120
  )
121
  refresh_btn.click(
122
  fn=get_latest_leaderboard_html,
123
- inputs=[sort_dropdown],
124
  outputs=leaderboard_html,
125
  )
126
 
127
  with gr.TabItem("📝 Submit Model"):
128
- model_name = gr.Textbox(
129
- label="Model Name", placeholder="facebook/wav2vec2-lv-60-espeak-cv-ft"
130
  )
131
- submission_name = gr.Textbox(
132
- label="Submission Name", placeholder="My Model v1.0"
133
  )
134
- github_url = gr.Textbox(
135
  label="Github/Kaggle/HF URL (optional)",
136
  placeholder="https://github.com/username/repo",
137
  )
@@ -140,14 +168,14 @@ with gr.Blocks(
140
 
141
  submit_btn.click(
142
  fn=submit_evaluation,
143
- inputs=[model_name, submission_name, github_url],
144
  outputs=result,
145
  )
146
 
147
- with gr.TabItem("📊 Model Status"):
148
  query = gr.Textbox(
149
- label="Model Name or Task ID",
150
- placeholder="Enter model name (e.g., facebook/wav2vec2-lv-60-espeak-cv-ft)",
151
  )
152
  status_btn = gr.Button("Check Status")
153
  status_output = gr.JSON(label="Status")
 
1
  # This is the main module that handles rendering the Gradio interface.
2
+ # NOTE: gradio will automatically create REST API endpoints for the functions that are used as event handlers in the interface.
 
3
 
4
  import gradio as gr
5
  import pandas as pd
6
 
7
+ from tasks import start_eval_task, get_status
8
+ from hf import get_or_create_leaderboard
9
 
10
 
11
+ def get_latest_leaderboard_html(datasets: list[str], sort_option: str) -> str:
12
  try:
13
  # Get the latest leaderboard data
14
+ df: pd.DataFrame = get_or_create_leaderboard().sort("submission_timestamp", reverse=True).to_pandas() # type: ignore
15
+ df = df.drop_duplicates("repo_id", keep="first")
16
+
17
+ if len(df) == 0:
18
+ return "No scores, please submit models for evaluation."
19
 
20
+ # Sort the dataframe so smallest PER or FER is at the top
21
+ sort_column = "average_per" if sort_option.lower() == "per" else "average_fer"
22
  df = df.sort_values(by=sort_column, ascending=True)
23
 
24
  # Format the dataframe for HTML display
25
  df = pd.DataFrame(
26
  {
27
+ "Model": df.apply(
28
+ lambda r: f'<a href="https://huggingface.co/{r["repo_id"]}" target="_blank">{r["display_name"]}</a>',
29
+ axis=1,
30
+ ),
31
+ "Average PER ⬇️": df["average_per"].apply(lambda x: f"{100 * x:.2f}%"),
32
+ }
33
+ | {
34
+ f"{d} FER ⬇️": df["average_fer" if d == "Average" else f"fer_{d}"].apply(
35
+ lambda x: f"{100 * x:.2f}%"
36
+ )
37
+ for d in datasets
38
+ }
39
+ | {
40
+ "Link": df["url"].apply(
41
  lambda x: (
42
  f'<a href="{x}" target="_blank">Repository</a>' if x else "N/A"
43
  )
44
  ),
45
+ "Submission Date": pd.to_datetime(
46
+ df["submission_timestamp"]
47
+ ).dt.strftime("%Y-%m-%d"),
48
  }
49
  )
50
  return df.to_html(escape=False, index=False, classes="styled-table")
51
  except Exception as e:
52
+
53
+ return f"Error updating leaderboard: {type(e).__name__} - {e}"
54
 
55
 
56
+ def submit_evaluation(model_id: str, display_name: str, url: str) -> str:
57
+ model_id = model_id.strip()
58
+ display_name = display_name.strip()
59
+ if not model_id or not display_name:
60
  return "⚠️ Please provide both model name and submission name."
61
 
62
  try:
63
+ task_id = start_eval_task(display_name, model_id, url)
64
  return f"✅ Evaluation submitted successfully! Task ID: {task_id}"
65
  except Exception as e:
66
  return f"❌ Error: {str(e)}"
 
74
  margin: 25px 0;
75
  font-size: 0.9em;
76
  font-family: sans-serif;
 
77
  }
78
  .styled-table thead tr {
79
  background: linear-gradient(45deg, #092746, #073562, #0A648F);
 
90
  }
91
  """
92
  ) as demo:
93
+ gr.Markdown("# 🎯 English Speech2IPA Leaderboard")
94
  gr.Markdown("#### Developed By: [Koel Labs](https://koellabs.com)")
95
  gr.Markdown(
96
  """
97
+ ## Evaluation
98
+ We use two standard metrics:
99
+
100
  - **PER (Phoneme Error Rate)**: The Levenshtein distance calculated between phoneme sequences of the predicted and actual transcriptions.
101
+ - **FER (Feature Error Rate)**: The edit distance between the predicted and actual phoneme sequences, weighted by the phonetic features from [panphon](https://github.com/dmort27/panphon).
102
 
103
+ Models are evaluated on a variety of English speech: native, non-native, and impaired. Read more about evaluations on [our blog](https://www.koellabs.com/blog/phonemic-transcription-metrics)
104
+
 
 
 
 
 
 
105
  ## Compute
106
  This leaderboard uses the free basic plan (16GB RAM, 2vCPUs) to allow for reproducability. The evaluation may take several hours to complete. Please be patient and do not submit the same model multiple times.
107
 
 
111
  )
112
  with gr.Tabs():
113
  with gr.TabItem("🏆 Leaderboard"):
114
+ dataset_dropdown = gr.Dropdown(
115
+ choices=["Average", "TIMIT", "EpaDB", "PSST", "SpeechOcean", "ISLE"],
116
+ value=["Average"],
117
+ multiselect=True,
118
+ interactive=True,
119
+ scale=2,
120
+ container=False, # Removes the box around the dropdown
121
+ )
122
  with gr.Row(elem_classes="controls-row"):
 
123
  sort_dropdown = gr.Dropdown(
124
+ choices=["FER", "PER"],
125
+ value="FER",
126
  interactive=True,
127
  scale=2,
128
  container=False, # Removes the box around the dropdown
 
129
  )
130
+ refresh_btn = gr.Button("Refresh 🔄", scale=2)
131
 
132
+ leaderboard_html = gr.HTML("Loading Leaderboard...")
133
+ demo.load(
134
+ fn=get_latest_leaderboard_html,
135
+ inputs=[dataset_dropdown, sort_dropdown],
136
+ outputs=leaderboard_html,
137
+ show_progress="minimal",
138
+ )
139
+ dataset_dropdown.change(
140
+ fn=get_latest_leaderboard_html,
141
+ inputs=[dataset_dropdown, sort_dropdown],
142
+ outputs=leaderboard_html,
143
+ )
144
  sort_dropdown.change(
145
  fn=get_latest_leaderboard_html,
146
+ inputs=[dataset_dropdown, sort_dropdown],
147
  outputs=leaderboard_html,
148
  )
149
  refresh_btn.click(
150
  fn=get_latest_leaderboard_html,
151
+ inputs=[dataset_dropdown, sort_dropdown],
152
  outputs=leaderboard_html,
153
  )
154
 
155
  with gr.TabItem("📝 Submit Model"):
156
+ model_id = gr.Textbox(
157
+ label="Model ID", placeholder="facebook/wav2vec2-lv-60-espeak-cv-ft"
158
  )
159
+ display_name = gr.Textbox(
160
+ label="Submission Name", placeholder="Facebook Wav2Vec2 Espeak 60"
161
  )
162
+ url = gr.Textbox(
163
  label="Github/Kaggle/HF URL (optional)",
164
  placeholder="https://github.com/username/repo",
165
  )
 
168
 
169
  submit_btn.click(
170
  fn=submit_evaluation,
171
+ inputs=[model_id, display_name, url],
172
  outputs=result,
173
  )
174
 
175
+ with gr.TabItem("📊 Submission Status"):
176
  query = gr.Textbox(
177
+ label="Model ID or Task ID",
178
+ placeholder="Enter model ID (e.g., facebook/wav2vec2-lv-60-espeak-cv-ft)",
179
  )
180
  status_btn = gr.Button("Check Status")
181
  status_output = gr.JSON(label="Status")
app/data.py DELETED
@@ -1,180 +0,0 @@
1
- # This module handles the data loading and preprocessing for various phoneme transcription datasets.
2
-
3
- import torch
4
- import torchaudio
5
-
6
- import zipfile
7
- from pathlib import Path
8
-
9
- # Get absolute path
10
- CURRENT_DIR = Path(__file__).parent.absolute()
11
-
12
- # Constants
13
- DATA_DIR = CURRENT_DIR / "data"
14
- TIMIT_PATH = DATA_DIR / "TIMIT.zip"
15
-
16
-
17
- # Abstract data manager class
18
- class DataManager:
19
- """Abstract class for handling dataset operations"""
20
-
21
- def get_file_list(self, subset: str) -> list[str]:
22
- """Get list of files for given subset"""
23
- raise NotImplementedError
24
-
25
- def load_audio(self, filename: str) -> torch.Tensor:
26
- """Load and preprocess audio file"""
27
- raise NotImplementedError
28
-
29
- def get_phonemes(self, filename: str) -> str:
30
- """Get phoneme sequence from file"""
31
- raise NotImplementedError
32
-
33
-
34
- # Implement datasets
35
- class TimitDataManager(DataManager):
36
- """Handles all TIMIT dataset operations"""
37
-
38
- # TIMIT to IPA mapping with direct simplifications
39
- _TIMIT_TO_IPA = {
40
- # Vowels (simplified)
41
- "aa": "ɑ",
42
- "ae": "æ",
43
- "ah": "ʌ",
44
- "ao": "ɔ",
45
- "aw": "aʊ",
46
- "ay": "aɪ",
47
- "eh": "ɛ",
48
- "er": "ɹ", # Simplified from 'ɝ'
49
- "ey": "eɪ",
50
- "ih": "ɪ",
51
- "ix": "i", # Simplified from 'ɨ'
52
- "iy": "i",
53
- "ow": "oʊ",
54
- "oy": "ɔɪ",
55
- "uh": "ʊ",
56
- "uw": "u",
57
- "ux": "u", # Simplified from 'ʉ'
58
- "ax": "ə",
59
- "ax-h": "ə", # Simplified from 'ə̥'
60
- "axr": "ɹ", # Simplified from 'ɚ'
61
- # Consonants
62
- "b": "",
63
- "bcl": "b",
64
- "d": "",
65
- "dcl": "d",
66
- "g": "",
67
- "gcl": "g",
68
- "p": "",
69
- "pcl": "p",
70
- "t": "",
71
- "tcl": "t",
72
- "k": "",
73
- "kcl": "k",
74
- "dx": "ɾ",
75
- "q": "ʔ",
76
- # Fricatives
77
- "jh": "dʒ",
78
- "ch": "tʃ",
79
- "s": "s",
80
- "sh": "ʃ",
81
- "z": "z",
82
- "zh": "ʒ",
83
- "f": "f",
84
- "th": "θ",
85
- "v": "v",
86
- "dh": "ð",
87
- "hh": "h",
88
- "hv": "h", # Simplified from 'ɦ'
89
- # Nasals (simplified)
90
- "m": "m",
91
- "n": "n",
92
- "ng": "ŋ",
93
- "em": "m", # Simplified from 'm̩'
94
- "en": "n", # Simplified from 'n̩'
95
- "eng": "ŋ", # Simplified from 'ŋ̍'
96
- "nx": "ɾ", # Simplified from 'ɾ̃'
97
- # Semivowels and Glides
98
- "l": "l",
99
- "r": "ɹ",
100
- "w": "w",
101
- "wh": "ʍ",
102
- "y": "j",
103
- "el": "l", # Simplified from 'l̩'
104
- # Special
105
- "epi": "", # Remove epenthetic silence
106
- "h#": "", # Remove start/end silence
107
- "pau": "", # Remove pause
108
- }
109
-
110
- def __init__(self, timit_path: Path):
111
- self.timit_path = timit_path
112
- self._zip_ = None
113
- print(f"TimitDataManager initialized with path: {self.timit_path.absolute()}")
114
- if not self.timit_path.exists():
115
- raise FileNotFoundError(
116
- f"TIMIT dataset not found at {self.timit_path.absolute()}. Try running ./scripts/download_data_lfs.sh again."
117
- )
118
- else:
119
- print("TIMIT dataset file exists!")
120
-
121
- @property
122
- def _zip(self):
123
- if not self._zip_:
124
- self._zip_ = zipfile.ZipFile(self.timit_path, "r")
125
- return self._zip_
126
-
127
- def get_file_list(self, subset: str) -> list[str]:
128
- """Get list of WAV files for given subset"""
129
- files = [
130
- f
131
- for f in self._zip.namelist()
132
- if f.endswith(".WAV") and subset.lower() in f.lower()
133
- ]
134
- print(f"Found {len(files)} WAV files in {subset} subset")
135
- if files:
136
- print("First 3 files:", files[:3])
137
- return files
138
-
139
- def load_audio(self, filename: str) -> torch.Tensor:
140
- """Load and preprocess audio file"""
141
- with self._zip.open(filename) as wav_file:
142
- waveform, sample_rate = torchaudio.load(wav_file) # type: ignore
143
-
144
- if waveform.shape[0] > 1:
145
- waveform = torch.mean(waveform, dim=0, keepdim=True)
146
-
147
- if sample_rate != 16000:
148
- waveform = torchaudio.transforms.Resample(sample_rate, 16000)(waveform)
149
-
150
- waveform = (waveform - waveform.mean()) / (waveform.std() + 1e-7)
151
-
152
- if waveform.dim() == 1:
153
- waveform = waveform.unsqueeze(0)
154
-
155
- return waveform
156
-
157
- def get_phonemes(self, filename: str) -> str:
158
- """Get cleaned phoneme sequence from PHN file and convert to IPA"""
159
- phn_file = filename.replace(".WAV", ".PHN")
160
- with self._zip.open(phn_file) as f:
161
- phonemes = []
162
- for line in f.read().decode("utf-8").splitlines():
163
- if line.strip():
164
- _, _, phone = line.split()
165
- phone = self._remove_stress_mark(phone)
166
- # Convert to IPA instead of using simplify_timit
167
- ipa = self._TIMIT_TO_IPA.get(phone.lower(), "")
168
- if ipa:
169
- phonemes.append(ipa)
170
- return "".join(phonemes) # Join without spaces for IPA
171
-
172
- def _remove_stress_mark(self, text: str) -> str:
173
- """Removes the combining double inverted breve (͡) from text"""
174
- if not isinstance(text, str):
175
- raise TypeError("Input must be string")
176
- return text.replace("͡", "")
177
-
178
-
179
- # Initialize data managers
180
- timit_manager = TimitDataManager(TIMIT_PATH)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/data/test/cache-38f74914f01da443.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7097497f3a64b59d868eb2b3dadf6887b383555398dec8f3b72e75a295ddb5a
3
+ size 1248
app/data/test/cache-43bad43a3f17100a.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a87f7da6c1210c5efca97e285fdf608b1101e8c6b506a03812ecf082f089aa0
3
+ size 1248
app/data/test/cache-7fc832a0865b46e3.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:966ac1866fb81a68bcb2269ad1293dd2c045022558b6824a87bb66cada9ff28a
3
+ size 1248
app/data/test/cache-8e3b20205f12c8bf.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78881cfe43c3a668b24c2269adc6219724a7fec0838bcdf74b71e96a583bf0c6
3
+ size 1248
app/data/test/cache-9a41aaef1a199c0a.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5c1ab32866ac66f93c5798a888db2d32ca3638aa119de45d587325c2d90964d
3
+ size 1248
app/data/test/cache-9a81afba5c72d77e.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aee3c9a01bfb57a914f31c6255c55cdd42c5cbda23fab357fb80c32710e92389
3
+ size 1248
app/data/test/cache-bf2efb6be770547b.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2b2b01c7d81595b4ba5e97902db7bf2ef353eacebf9912a930d16570948cd2d
3
+ size 1248
app/data/test/cache-ceccabba78df3ad3.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:312a7ed183b7aabac6a4553b31fd55dcd6a4af9a1627978f8117278c540885da
3
+ size 1248
app/data/test/cache-d8c639c50adcd3ec.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4347f285083472be0661457b0b6cdf927a302e556a5584d10cdedd15ca936919
3
+ size 1248
app/data/test/cache-f9690e73716e8fdd.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0c055e60b8afcc4c763157f34d0d17f683f0f8b578116eaf9e604a3d178d9e5
3
+ size 1248
app/data/test/data-00000-of-00001.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:510501aa7be7ece974c2e9feaaad94ec5d38a7fe4e35dee9b3bf2ee9a485062c
3
+ size 53582720
app/data/test/dataset_info.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "citation": "",
3
+ "description": "",
4
+ "features": {
5
+ "audio": {
6
+ "_type": "Audio"
7
+ },
8
+ "ipa": {
9
+ "dtype": "string",
10
+ "_type": "Value"
11
+ },
12
+ "dataset": {
13
+ "dtype": "string",
14
+ "_type": "Value"
15
+ }
16
+ },
17
+ "homepage": "",
18
+ "license": ""
19
+ }
app/data/test/state.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_data_files": [
3
+ {
4
+ "filename": "data-00000-of-00001.arrow"
5
+ }
6
+ ],
7
+ "_fingerprint": "8693a894a9182281",
8
+ "_format_columns": [
9
+ "audio",
10
+ "ipa",
11
+ "dataset"
12
+ ],
13
+ "_format_kwargs": {},
14
+ "_format_type": null,
15
+ "_output_all_columns": false,
16
+ "_split": null
17
+ }
app/hf.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This module handles interfacing with the huggingface api
2
+
3
+ from typing import Literal
4
+ from datetime import datetime
5
+
6
+ from huggingface_hub import HfApi
7
+ from huggingface_hub.errors import RepositoryNotFoundError
8
+ from datasets import load_dataset, concatenate_datasets, Dataset, Features, Value
9
+ from datasets.exceptions import DatasetNotFoundError
10
+
11
+ api = HfApi()
12
+
13
+ LEADERBOARD_ID = "KoelLabs/_IPA-TRANSCRIPTION-EN-SCORES"
14
+ LEADERBOARD_FEATURES = Features(
15
+ {
16
+ "display_name": Value("string"),
17
+ "repo_id": Value("string"),
18
+ "repo_hash": Value("string"),
19
+ "repo_last_modified": Value("timestamp[s, tz=UTC]"),
20
+ "submission_timestamp": Value("timestamp[s, tz=UTC]"),
21
+ "average_per": Value("float32"),
22
+ "average_fer": Value("float32"),
23
+ "url": Value("string"),
24
+ "fer_TIMIT": Value("float32"),
25
+ "fer_EpaDB": Value("float32"),
26
+ "fer_PSST": Value("float32"),
27
+ "fer_SpeechOcean": Value("float32"),
28
+ "fer_ISLE": Value("float32"),
29
+ }
30
+ )
31
+ LEADERBOARD_DEFAULTS = {
32
+ "url": "",
33
+ "fer_TIMIT": None,
34
+ "fer_EpaDB": None,
35
+ "fer_PSST": None,
36
+ "fer_SpeechOcean": None,
37
+ "fer_ISLE": None,
38
+ }
39
+
40
+
41
+ def get_repo_info(
42
+ repo_id, type: Literal["model", "dataset", "space"] = "model"
43
+ ) -> tuple[str, datetime]:
44
+ try:
45
+ repo_info = api.repo_info(repo_id=repo_id, repo_type=type)
46
+ return repo_info.sha, repo_info.last_modified # type: ignore
47
+ except RepositoryNotFoundError:
48
+ return "", datetime(year=1970, month=1, day=1)
49
+
50
+
51
+ def get_or_create_leaderboard() -> Dataset:
52
+ modified = False
53
+ try:
54
+ dataset: Dataset = load_dataset(LEADERBOARD_ID)["train"] # type: ignore
55
+ except DatasetNotFoundError:
56
+ empty_data = {col: [] for col in LEADERBOARD_FEATURES.keys()}
57
+ dataset = Dataset.from_dict(empty_data, features=LEADERBOARD_FEATURES)
58
+ modified = True
59
+ except ValueError:
60
+ empty_data = {col: [] for col in LEADERBOARD_FEATURES.keys()}
61
+ dataset = Dataset.from_dict(empty_data, features=LEADERBOARD_FEATURES)
62
+
63
+ for col in LEADERBOARD_FEATURES.keys():
64
+ if col not in dataset.column_names:
65
+ modified = True
66
+ dataset = dataset.add_column(col, [LEADERBOARD_DEFAULTS.get(col)] * len(dataset)) # type: ignore
67
+ dataset = dataset.cast_column(col, feature=LEADERBOARD_FEATURES[col])
68
+
69
+ if modified:
70
+ dataset.push_to_hub(LEADERBOARD_ID, private=True)
71
+
72
+ return dataset
73
+
74
+
75
+ def add_leaderboard_entry(
76
+ display_name: str,
77
+ repo_id: str,
78
+ repo_hash: str,
79
+ repo_last_modified: datetime,
80
+ submission_timestamp: datetime,
81
+ average_per: float,
82
+ average_fer: float,
83
+ url: str,
84
+ per_dataset_fers: dict = {},
85
+ ):
86
+ existing_dataset = get_or_create_leaderboard()
87
+ new_row = Dataset.from_dict(
88
+ dict(
89
+ display_name=[display_name],
90
+ repo_id=[repo_id],
91
+ repo_hash=[repo_hash],
92
+ repo_last_modified=[repo_last_modified.replace(microsecond=0)],
93
+ submission_timestamp=[submission_timestamp.replace(microsecond=0)],
94
+ average_per=[average_per],
95
+ average_fer=[average_fer],
96
+ url=[url],
97
+ fer_TIMIT=[per_dataset_fers.get("TIMIT")],
98
+ fer_EpaDB=[per_dataset_fers.get("EpaDB")],
99
+ fer_PSST=[per_dataset_fers.get("PSST")],
100
+ fer_SpeechOcean=[per_dataset_fers.get("SpeechOcean")],
101
+ fer_ISLE=[per_dataset_fers.get("ISLE")],
102
+ ),
103
+ features=LEADERBOARD_FEATURES,
104
+ )
105
+ combined_dataset = concatenate_datasets([existing_dataset, new_row])
106
+ combined_dataset.push_to_hub(LEADERBOARD_ID, private=True)
107
+
108
+
109
+ if __name__ == "__main__":
110
+ print(get_repo_info(LEADERBOARD_ID, type="dataset"))
111
+ print(get_or_create_leaderboard().to_pandas().head(5)) # type: ignore
app/inference.py CHANGED
@@ -1,162 +1,50 @@
1
- # This module handles model inference and evaluation.
2
-
3
- from datetime import datetime
4
- from typing import Optional
5
 
6
  import torch
7
  from transformers import AutoProcessor, AutoModelForCTC
8
 
9
- from data import timit_manager
10
- from phone_metrics import PhoneErrorMetrics
11
-
12
- # Initialize evaluation metric
13
- phone_errors = PhoneErrorMetrics()
14
-
15
-
16
- class ModelManager:
17
- """Handles model loading and inference"""
18
-
19
- def __init__(self):
20
- self.models = {}
21
- self.processors = {}
22
- self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
23
- self.batch_size = 32
24
-
25
- def get_model_and_processor(self, model_name: str):
26
- """Get or load model and processor"""
27
- if model_name not in self.models:
28
- print("Loading processor with phoneme tokenizer...")
29
- processor = AutoProcessor.from_pretrained(model_name)
30
-
31
- print("Loading model...", {model_name})
32
- model = AutoModelForCTC.from_pretrained(model_name).to(self.device)
33
-
34
- self.models[model_name] = model
35
- self.processors[model_name] = processor
36
-
37
- return self.models[model_name], self.processors[model_name]
38
-
39
- def transcribe(self, audio_list: list[torch.Tensor], model_name: str) -> list[str]:
40
- """Transcribe a batch of audio using specified model"""
41
- model, processor = self.get_model_and_processor(model_name)
42
- if not model or not processor:
43
- raise Exception("Model and processor not loaded")
44
-
45
- # Process audio in batches
46
- all_predictions = []
47
- for i in range(0, len(audio_list), self.batch_size):
48
- batch_audio = audio_list[i : i + self.batch_size]
49
-
50
- # Pad sequence within batch
51
- max_length = max(audio.shape[-1] for audio in batch_audio)
52
- padded_audio = torch.zeros((len(batch_audio), 1, max_length))
53
- attention_mask = torch.zeros((len(batch_audio), max_length))
54
-
55
- for j, audio in enumerate(batch_audio):
56
- padded_audio[j, :, : audio.shape[-1]] = audio
57
- attention_mask[j, : audio.shape[-1]] = 1
58
-
59
- # Process batch
60
- inputs = processor(
61
- padded_audio.squeeze(1).numpy(),
62
- sampling_rate=16000,
63
- return_tensors="pt",
64
- padding=True,
65
- )
66
-
67
- input_values = inputs.input_values.to(self.device)
68
- attention_mask = inputs.get("attention_mask", attention_mask).to(
69
- self.device
70
- )
71
-
72
- with torch.no_grad():
73
- outputs = model(
74
- input_values=input_values, attention_mask=attention_mask
75
- )
76
- logits = outputs.logits
77
- predicted_ids = torch.argmax(logits, dim=-1)
78
- predictions = processor.batch_decode(
79
- predicted_ids, skip_special_tokens=True
80
- )
81
- predictions = [pred.replace(" ", "") for pred in predictions]
82
- all_predictions.extend(predictions)
83
-
84
- return all_predictions
85
-
86
-
87
- def evaluate_model(
88
- model_name: str,
89
- subset: str = "test",
90
- max_samples: Optional[int] = None,
91
- ):
92
- """Evaluate model on TIMIT dataset"""
93
-
94
- files = timit_manager.get_file_list(subset)
95
- if max_samples:
96
- files = files[:max_samples]
97
-
98
- results = []
99
- total_per = total_pwed = 0
100
-
101
- # Process files in batches
102
- batch_size = model_manager.batch_size
103
- for i in range(0, len(files), batch_size):
104
- batch_files = files[i : i + batch_size]
105
-
106
- # Load batch audio and ground truth
107
- batch_audio = []
108
- batch_ground_truth = []
109
- for wav_file in batch_files:
110
- audio = timit_manager.load_audio(wav_file)
111
- ground_truth = timit_manager.get_phonemes(wav_file)
112
- batch_audio.append(audio)
113
- batch_ground_truth.append(ground_truth)
114
 
115
- # Get predictions for batch
116
- predictions = model_manager.transcribe(batch_audio, model_name)
117
 
118
- # Calculate metrics for each file in batch
119
- for _, (wav_file, prediction, ground_truth) in enumerate(
120
- zip(batch_files, predictions, batch_ground_truth)
121
- ):
122
- metrics = phone_errors.compute(
123
- predictions=[prediction],
124
- references=[ground_truth],
125
- is_normalize_pfer=True,
126
- )
127
 
128
- per = metrics["phone_error_rates"][0]
129
- pwed = metrics["phone_feature_error_rates"][0]
130
 
131
- results.append(
132
- {
133
- "file": wav_file,
134
- "ground_truth": ground_truth,
135
- "prediction": prediction,
136
- "per": per,
137
- "pwed": pwed,
138
- }
139
- )
140
 
141
- total_per += per
142
- total_pwed += pwed
 
 
 
143
 
144
- if not results:
145
- raise Exception("No files were successfully processed")
146
 
147
- avg_per = total_per / len(results)
148
- avg_pwed = total_pwed / len(results)
 
 
149
 
150
- return {
151
- "model": model_name,
152
- "subset": subset,
153
- "num_files": len(results),
154
- "average_per": avg_per,
155
- "average_pwed": avg_pwed,
156
- "detailed_results": results[:5],
157
- "timestamp": datetime.now().isoformat(),
158
- }
159
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
 
161
- # Initialize managers
162
- model_manager = ModelManager()
 
1
+ # This module handles model inference
 
 
 
2
 
3
  import torch
4
  from transformers import AutoProcessor, AutoModelForCTC
5
 
6
+ DEVICE = (
7
+ "cuda"
8
+ if torch.cuda.is_available()
9
+ else "mps" if torch.backends.mps.is_available() else "cpu"
10
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ # set espeak library path for macOS
13
+ import sys
14
 
15
+ if sys.platform == "darwin":
16
+ from phonemizer.backend.espeak.wrapper import EspeakWrapper
 
 
 
 
 
 
 
17
 
18
+ _ESPEAK_LIBRARY = "/opt/homebrew/Cellar/espeak/1.48.04_1/lib/libespeak.1.1.48.dylib"
19
+ EspeakWrapper.set_library(_ESPEAK_LIBRARY)
20
 
 
 
 
 
 
 
 
 
 
21
 
22
+ def clear_cache():
23
+ if torch.cuda.is_available():
24
+ torch.cuda.empty_cache()
25
+ torch.cuda.ipc_collect()
26
+ torch.mps.empty_cache()
27
 
 
 
28
 
29
+ def load_model(model_id, device=DEVICE):
30
+ processor = AutoProcessor.from_pretrained(model_id)
31
+ model = AutoModelForCTC.from_pretrained(model_id).to(device)
32
+ return model, processor
33
 
 
 
 
 
 
 
 
 
 
34
 
35
+ def transcribe(audio, model, processor) -> str:
36
+ input_values = (
37
+ processor(
38
+ [audio],
39
+ sampling_rate=processor.feature_extractor.sampling_rate,
40
+ return_tensors="pt",
41
+ padding=True,
42
+ )
43
+ .input_values.type(torch.float32)
44
+ .to(model.device)
45
+ )
46
+ with torch.no_grad():
47
+ logits = model(input_values).logits
48
 
49
+ predicted_ids = torch.argmax(logits, dim=-1)
50
+ return processor.decode(predicted_ids[0])
app/metrics.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This module defines evaluation metrics
2
+
3
+ from yaml import warnings
4
+
5
+ warnings({"YAMLLoadWarning": False})
6
+
7
+ import panphon
8
+ import panphon.distance
9
+
10
+ ft = panphon.FeatureTable()
11
+ panphon_dist = panphon.distance.Distance()
12
+ inverse_double_weight_sum = 1 / (sum(ft.weights) * 2)
13
+
14
+
15
+ def per(prediction, ground_truth):
16
+ """
17
+ Phoneme Error Rate: the number of edits (substitutions, insertions, deletions)
18
+ needed to transform the prediction into the ground truth divided by the length of the ground truth.
19
+ """
20
+ return panphon_dist.fast_levenshtein_distance(prediction, ground_truth) / len(
21
+ ground_truth
22
+ )
23
+
24
+
25
+ def fer(prediction, ground_truth):
26
+ """
27
+ Feature Error Rate: the edits weighted by their acoustic features summed up and divided by the length of the ground truth.
28
+ """
29
+ return (
30
+ inverse_double_weight_sum
31
+ * panphon_dist.weighted_feature_edit_distance(ground_truth, prediction)
32
+ / len(ground_truth)
33
+ )
app/phone_metrics.py DELETED
@@ -1,108 +0,0 @@
1
- """
2
- This module implements phone error metrics based on the work from ginic/phone_errors.
3
- Original implementation: https://huggingface.co/spaces/ginic/phone_errors
4
-
5
- Citation:
6
- @inproceedings{Mortensen-et-al:2016,
7
- author = {David R. Mortensen and
8
- Patrick Littell and
9
- Akash Bharadwaj and
10
- Kartik Goyal and
11
- Chris Dyer and
12
- Lori S. Levin},
13
- title = {PanPhon: {A} Resource for Mapping {IPA} Segments to Articulatory Feature Vectors},
14
- booktitle = {Proceedings of {COLING} 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
15
- pages = {3475--3484},
16
- publisher = {{ACL}},
17
- year = {2016}
18
- }
19
- """
20
-
21
- import numpy as np
22
- import panphon.distance
23
-
24
-
25
- class PhoneErrorMetrics:
26
- def __init__(self, feature_model: str = "segment"):
27
- """Initialize the phone error metrics calculator.
28
-
29
- Args:
30
- feature_model (str): panphon feature parsing model ("strict", "permissive", or "segment")
31
- """
32
- self.distance_computer = panphon.distance.Distance(feature_model=feature_model)
33
-
34
- def _phone_error_rate(self, prediction: str, reference: str) -> float:
35
- """Compute phone error rate between prediction and reference.
36
-
37
- Args:
38
- prediction (str): Predicted IPA string
39
- reference (str): Reference IPA string
40
-
41
- Returns:
42
- float: Phone error rate
43
- """
44
- if not reference:
45
- raise ValueError("Reference string cannot be empty")
46
-
47
- pred_phones = self.distance_computer.fm.ipa_segs(prediction)
48
- ref_phones = self.distance_computer.fm.ipa_segs(reference)
49
-
50
- phone_edits = self.distance_computer.min_edit_distance(
51
- lambda x: 1, # deletion cost
52
- lambda x: 1, # insertion cost
53
- lambda x, y: 0 if x == y else 1, # substitution cost
54
- [[]],
55
- pred_phones,
56
- ref_phones,
57
- )
58
-
59
- return phone_edits / len(ref_phones)
60
-
61
- def compute(
62
- self,
63
- predictions: list[str],
64
- references: list[str],
65
- is_normalize_pfer: bool = False,
66
- ) -> dict:
67
- """Compute phone error metrics between predictions and references.
68
-
69
- Args:
70
- predictions (List[str]): List of predicted IPA strings
71
- references (List[str]): List of reference IPA strings
72
- is_normalize_pfer (bool): Whether to normalize phone feature error rates
73
-
74
- Returns:
75
- Dict containing:
76
- - phone_error_rates: List of PER for each pair
77
- - mean_phone_error_rate: Average PER
78
- - phone_feature_error_rates: List of PFER for each pair
79
- - mean_phone_feature_error_rate: Average PFER
80
- - feature_error_rates: List of FER for each pair
81
- - mean_feature_error_rate: Average FER
82
- """
83
- phone_error_rates = []
84
- feature_error_rates = []
85
- hamming_distances = []
86
-
87
- for pred, ref in zip(predictions, references):
88
- if is_normalize_pfer:
89
- hd = self.distance_computer.hamming_feature_edit_distance_div_maxlen(
90
- pred, ref
91
- )
92
- else:
93
- hd = self.distance_computer.hamming_feature_edit_distance(pred, ref)
94
-
95
- hamming_distances.append(hd)
96
- per = self._phone_error_rate(pred, ref)
97
- phone_error_rates.append(per)
98
- fer = self.distance_computer.feature_error_rate(pred, ref)
99
- feature_error_rates.append(fer)
100
-
101
- return {
102
- "phone_error_rates": phone_error_rates,
103
- "mean_phone_error_rate": float(np.mean(phone_error_rates)),
104
- "phone_feature_error_rates": hamming_distances,
105
- "mean_phone_feature_error_rate": float(np.mean(hamming_distances)),
106
- "feature_error_rates": feature_error_rates,
107
- "mean_feature_error_rate": float(np.mean(feature_error_rates)),
108
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/queue/leaderboard.json DELETED
@@ -1,192 +0,0 @@
1
- [
2
- {
3
- "submission_id": "8e6a3a00-59fa-4a24-861d-a132a8212658",
4
- "submission_name": "facebook espeak",
5
- "model": "facebook/wav2vec2-lv-60-espeak-cv-ft",
6
- "average_per": 0.33667301260691423,
7
- "average_pwed": 0.1276725657099669,
8
- "subset": "timit-test",
9
- "github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
10
- "submission_date": "2024-12-05T07:32:06.850230"
11
- },
12
- {
13
- "submission_id": "70aceb68-ad86-4a83-9998-08adb27b4d5c",
14
- "submission_name": "english phoneme model",
15
- "model": "KoelLabs/xlsr-timit-b0",
16
- "average_per": 0.12572285528714347,
17
- "average_pwed": 0.06476636812791145,
18
- "subset": "timit-test",
19
- "github_url": "https://github.com/KoelLabs/",
20
- "submission_date": "2024-12-05T08:25:24.982477"
21
- },
22
- {
23
- "submission_id": "80b57299-b3ab-4caf-ac4a-898c8398046e",
24
- "submission_name": "speech 31 model",
25
- "model": "speech31/wav2vec2-large-TIMIT-IPA",
26
- "average_per": 0.4415425496841929,
27
- "average_pwed": 0.18625930002594002,
28
- "subset": "timit-test",
29
- "github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
30
- "submission_date": "2024-12-05T09:36:14.570315"
31
- },
32
- {
33
- "submission_id": "0cbcab0a-bd07-421f-82a0-480c9507a214",
34
- "submission_name": "jubiliano model wav2vec2",
35
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
36
- "average_per": 0.6318471187460027,
37
- "average_pwed": 0.222932144739126,
38
- "subset": "timit-test",
39
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces/tree/d5312009d8e620b183c334dfdd9ffc6b4f06f8c1",
40
- "submission_date": "2024-12-05T10:17:21.334530"
41
- },
42
- {
43
- "submission_id": "0fc29c54-3db2-46b6-aeee-c96484306751",
44
- "submission_name": "xlsr 53 model",
45
- "model": "facebook/wav2vec2-xlsr-53-espeak-cv-ft",
46
- "average_per": 0.348845592557092,
47
- "average_pwed": 0.1386742019529415,
48
- "subset": "timit-test",
49
- "github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
50
- "submission_date": "2024-12-05T10:34:26.157054"
51
- },
52
- {
53
- "submission_id": "a23026ec-acac-4481-9761-f9368b4b94f1",
54
- "submission_name": "ginic model wav2vec2 finetuned on buckeye",
55
- "model": "ginic/hyperparam_tuning_1_wav2vec2-large-xlsr-buckeye-ipa",
56
- "average_per": 0.2766466385175833,
57
- "average_pwed": 0.10410683992600853,
58
- "subset": "timit-test",
59
- "github_url": "https://huggingface.co/ginic/vary_individuals_old_only_1_wav2vec2-large-xlsr-buckeye-ipa",
60
- "submission_date": "2024-12-05T11:06:07.984825"
61
- },
62
- {
63
- "submission_id": "e3bbf521-cc32-43a6-bf1c-5ddc6bce04ab",
64
- "submission_name": "koel labs initial ",
65
- "model": "KoelLabs/xlsr-timit-a0",
66
- "average_per": 0.24242141955346685,
67
- "average_pwed": 0.17395311976938,
68
- "subset": "timit-test",
69
- "github_url": "https://github.com/KoelLabs/ML/",
70
- "submission_date": "2024-12-12T16:07:25.391145"
71
- },
72
- {
73
- "submission_id": "02f223d4-7b98-4613-9377-19b74defe308",
74
- "submission_name": "wav2vec2 ipa eng ",
75
- "model": "snu-nia-12/wav2vec2-large_nia12_phone-ipa_english",
76
- "average_per": 0.4847029843149011,
77
- "average_pwed": 0.2072006544586948,
78
- "subset": "timit-test",
79
- "github_url": null,
80
- "submission_date": "2024-12-18T22:01:20.855881"
81
- },
82
- {
83
- "submission_id": "bed08468-42c7-459f-a46d-49ead50abfbc",
84
- "submission_name": "fine-tuned version of facebook/wav2vec2-xls-r-300m on the Timit dataset",
85
- "model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
86
- "average_per": 0.2561961414705681,
87
- "average_pwed": 0.1378394393452702,
88
- "subset": "timit-test",
89
- "github_url": "https://www.kaggle.com/code/vitouphy/phoneme-recognition-with-wav2vec2",
90
- "submission_date": "2024-12-18T22:50:59.627338"
91
- },
92
- {
93
- "submission_id": "4086072e-9368-442f-97cd-1fda6bf6656e",
94
- "submission_name": "wav2vec2 model",
95
- "model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa-plus-2000",
96
- "average_per": 0.6479484324708775,
97
- "average_pwed": 0.18710002665151734,
98
- "subset": "timit-test",
99
- "github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
100
- "submission_date": "2024-12-18T23:29:27.322286"
101
- },
102
- {
103
- "submission_id": "d0b2f8b4-20f8-45b4-b1a5-c81390d75b29",
104
- "submission_name": "wav2vec2 non-english transcription",
105
- "model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
106
- "average_per": 0.6417205190285036,
107
- "average_pwed": 0.19048963968896404,
108
- "subset": "timit-test",
109
- "github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
110
- "submission_date": "2024-12-19T07:41:18.135985"
111
- },
112
- {
113
- "submission_id": "3bbb0f03-31a5-45b0-bde3-bbf574f19983",
114
- "submission_name": "phonetic transcription with the Buckeye corpus, from xlsr-53 model",
115
- "model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
116
- "average_per": 0.2810165988557621,
117
- "average_pwed": 0.10703377161801164,
118
- "subset": "timit-test",
119
- "github_url": "https://github.com/ginic/multipa/tree/buckeye_experiments",
120
- "submission_date": "2024-12-20T13:45:52.010575"
121
- },
122
- {
123
- "submission_id": "2ed095f7-4712-4539-87b6-1e8588ac92a3",
124
- "submission_name": "phonetic transcription",
125
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.9.2WithoutSpaces",
126
- "average_per": 0.9537775908999574,
127
- "average_pwed": 0.9351204819224959,
128
- "subset": "timit-test",
129
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces",
130
- "submission_date": "2024-12-20T14:21:32.293694"
131
- },
132
- {
133
- "submission_id": "9cf02ce8-fc43-4d23-a8bb-b44e3116a93c",
134
- "submission_name": "Jubliano xlsr model",
135
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
136
- "average_per": 0.9887075544197294,
137
- "average_pwed": 0.9692486915717254,
138
- "subset": "timit-test",
139
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-nl1.1",
140
- "submission_date": "2024-12-20T15:40:51.632895"
141
- },
142
- {
143
- "submission_id": "d5013845-f5c9-428a-8b39-7db066bb9f05",
144
- "submission_name": "speech31 phoneme transcription english",
145
- "model": "speech31/wavlm-large-english-ipa",
146
- "average_per": 0.3694017596969614,
147
- "average_pwed": 0.1356824900612308,
148
- "subset": "timit-test",
149
- "github_url": "https://huggingface.co/speech31/wavlm-large-english-ipa",
150
- "submission_date": "2024-12-20T16:26:47.982209"
151
- },
152
- {
153
- "submission_id": "362c788d-bc2e-427d-8c74-105f6235cf62",
154
- "submission_name": "speech31 xlsr model",
155
- "model": "speech31/XLS-R-300m-english-ipa",
156
- "average_per": 0.36382554692045954,
157
- "average_pwed": 0.1299702312124616,
158
- "subset": "timit-test",
159
- "github_url": "https://huggingface.co/speech31/XLS-R-300m-english-ipa",
160
- "submission_date": "2024-12-20T16:47:54.826509"
161
- },
162
- {
163
- "submission_id": "49e22782-0af1-4313-bc0c-60cb2f28d78f",
164
- "submission_name": "model is a fine-tuned version of facebook/wav2vec2-large on the TIMIT dataset",
165
- "model": "speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
166
- "average_per": 0.44563344149564776,
167
- "average_pwed": 0.18844914029048124,
168
- "subset": "timit-test",
169
- "github_url": "https://huggingface.co/speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
170
- "submission_date": "2024-12-20T17:05:35.213738"
171
- },
172
- {
173
- "submission_id": "26c04108-1131-435c-95f1-bb56b2aff06c",
174
- "submission_name": "fine-tuned version of facebook/wav2vec2-large on the None dataset",
175
- "model": "speech31/wav2vec2-large-TIMIT-IPA2",
176
- "average_per": 0.4847029843149011,
177
- "average_pwed": 0.2072006544586948,
178
- "subset": "timit-test",
179
- "github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
180
- "submission_date": "2024-12-20T22:50:50.645178"
181
- },
182
- {
183
- "submission_id": "4126d265-418f-4d11-8a29-4e69f064f1dd",
184
- "submission_name": "ginic model, facebook/wav2vec2-large-xlsr-53 fine tuned",
185
- "model": "ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
186
- "average_per": 0.2807914104790719,
187
- "average_pwed": 0.10494355278037441,
188
- "subset": "timit-test",
189
- "github_url": "https://huggingface.co/ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
190
- "submission_date": "2024-12-21T01:31:04.862397"
191
- }
192
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/queue/results.json DELETED
@@ -1,1014 +0,0 @@
1
- [
2
- {
3
- "task_id": "721b4c64-a825-42d3-bb0a-bdff9ee1ed0f",
4
- "model": "facebook/wav2vec2-lv-60-espeak-cv-ft",
5
- "subset": "timit-test",
6
- "num_files": 1680,
7
- "average_per": 0.33667301260691423,
8
- "average_pwed": 0.1276725657099669,
9
- "detailed_results": [
10
- {
11
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
12
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
13
- "prediction": "ʃiːhædjɚdɑːɹksuːɾɪnɡɹiːsiwɑːʃwɑːɾɚɹɑːljiː",
14
- "per": 0.3939393939393939,
15
- "pwed": 0.13888888888888887
16
- },
17
- {
18
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
19
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
20
- "prediction": "doʊntæskmiːtəkæɹiɐnoɪliɹæɡlaɪkðæt",
21
- "per": 0.32142857142857145,
22
- "pwed": 0.13541666666666666
23
- },
24
- {
25
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
26
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
27
- "prediction": "hɪzkæptənwʌzθɪnændhæɡɚdændhɪzbjuːɾɪfəlbuːtswɜːwɔːɹnændʃæbi",
28
- "per": 0.3617021276595745,
29
- "pwed": 0.13915094339622644
30
- },
31
- {
32
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
33
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
34
- "prediction": "ðəɹiːzənzfɜːðɪsdaɪvsiːmdfuːlɪʃnaʊ",
35
- "per": 0.20689655172413793,
36
- "pwed": 0.022988505747126433
37
- },
38
- {
39
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
40
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
41
- "prediction": "pɹədʌkʃənmeɪfɔːlfɑːɹbᵻloʊɛkspɛkteɪʃənz",
42
- "per": 0.36363636363636365,
43
- "pwed": 0.1392857142857143
44
- }
45
- ],
46
- "timestamp": "2024-12-05T07:32:06.849017"
47
- },
48
- {
49
- "task_id": "d6fe0956-b5b4-4105-835e-8dee1872ee4d",
50
- "model": "KoelLabs/xlsr-timit-b0",
51
- "subset": "timit-test",
52
- "num_files": 1680,
53
- "average_per": 0.12572285528714347,
54
- "average_pwed": 0.06476636812791145,
55
- "detailed_results": [
56
- {
57
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
58
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
59
- "prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɑʃwɔɾɹʔɔljɪɹ",
60
- "per": 0.12121212121212122,
61
- "pwed": 0.037990196078431376
62
- },
63
- {
64
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
65
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
66
- "prediction": "oʊnæskmitikæɹinɔɪliɹæɡlaɪkðæt",
67
- "per": 0.14285714285714285,
68
- "pwed": 0.10632183908045977
69
- },
70
- {
71
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
72
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
73
- "prediction": "hɪzkæpinwəsθɪnhæɡɹdinizbjuɾiflbutswɹwɔɹninʃæbi",
74
- "per": 0.10638297872340426,
75
- "pwed": 0.0425531914893617
76
- },
77
- {
78
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
79
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
80
- "prediction": "ðəɹiznzfɹðistaɪvsimdfuliʃnaʊ",
81
- "per": 0.13793103448275862,
82
- "pwed": 0.04166666666666667
83
- },
84
- {
85
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
86
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
87
- "prediction": "pɹdʌkʃnmeɪfɔlfɑɹbloʊɛkspɛkeɪʃəns",
88
- "per": 0.21212121212121213,
89
- "pwed": 0.10858585858585859
90
- }
91
- ],
92
- "timestamp": "2024-12-05T08:25:24.980111"
93
- },
94
- {
95
- "task_id": "dbf4642a-fb13-402c-8a74-cc41fc4be599",
96
- "model": "speech31/wav2vec2-large-TIMIT-IPA",
97
- "subset": "timit-test",
98
- "num_files": 1680,
99
- "average_per": 0.4415425496841929,
100
- "average_pwed": 0.18625930002594002,
101
- "detailed_results": [
102
- {
103
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
104
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
105
- "prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrrrɪrɪrʃ",
106
- "per": 0.5757575757575758,
107
- "pwed": 0.25
108
- },
109
- {
110
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
111
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
112
- "prediction": "doʊntæskmitɪkɛri��nɔɪliræglaɪkðəttm",
113
- "per": 0.35714285714285715,
114
- "pwed": 0.172979797979798
115
- },
116
- {
117
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
118
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
119
- "prediction": "hɪzkæptɪnwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnəndʃæbi",
120
- "per": 0.40425531914893614,
121
- "pwed": 0.17500000000000004
122
- },
123
- {
124
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
125
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
126
- "prediction": "ðərizɪənzfərðɪstaɪvsimdfulɪʃnaʊaʊaʊ",
127
- "per": 0.3793103448275862,
128
- "pwed": 0.18928571428571428
129
- },
130
- {
131
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
132
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
133
- "prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzd",
134
- "per": 0.3939393939393939,
135
- "pwed": 0.13626126126126126
136
- }
137
- ],
138
- "timestamp": "2024-12-05T09:36:14.568321"
139
- },
140
- {
141
- "task_id": "912449a4-d7ed-4af4-b5be-5c2c57ec09ff",
142
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
143
- "subset": "timit-test",
144
- "num_files": 1680,
145
- "average_per": 0.6318471187460027,
146
- "average_pwed": 0.222932144739126,
147
- "detailed_results": [
148
- {
149
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
150
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
151
- "prediction": "ʒihɛldjydɑrksydənrisiwɑswadərɑlhir",
152
- "per": 0.5454545454545454,
153
- "pwed": 0.11764705882352941
154
- },
155
- {
156
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
157
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
158
- "prediction": "dɑnraːstɪkmədəkaːrənoːjliralɪkaːn",
159
- "per": 0.7857142857142857,
160
- "pwed": 0.2341954022988506
161
- },
162
- {
163
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
164
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
165
- "prediction": "xisʃktəʋɑstɪnɛnhɛɪɡərdɛnenzbjudəvɔlbutvɔːrʋɔrnənʃaːbi",
166
- "per": 0.6595744680851063,
167
- "pwed": 0.18382352941176472
168
- },
169
- {
170
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
171
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
172
- "prediction": "dərizənsvərdəstajfzimtvuləsna",
173
- "per": 0.6206896551724138,
174
- "pwed": 0.11781609195402297
175
- },
176
- {
177
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
178
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
179
- "prediction": "pːdkəmeːvɑlvɑrbəloɛkspɛkteːʃəns",
180
- "per": 0.5454545454545454,
181
- "pwed": 0.2171717171717172
182
- }
183
- ],
184
- "timestamp": "2024-12-05T10:17:21.331572"
185
- },
186
- {
187
- "task_id": "c79df17e-2bb2-4253-ae26-f7cc6ab21265",
188
- "model": "facebook/wav2vec2-xlsr-53-espeak-cv-ft",
189
- "subset": "timit-test",
190
- "num_files": 1680,
191
- "average_per": 0.348845592557092,
192
- "average_pwed": 0.1386742019529415,
193
- "detailed_results": [
194
- {
195
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
196
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
197
- "prediction": "ʃiːhædjɚdksuːtɪnɡɹiːsiwɑːʃwɑːɾɚɑːljɪ",
198
- "per": 0.48484848484848486,
199
- "pwed": 0.21338383838383837
200
- },
201
- {
202
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
203
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
204
- "prediction": "doːntæskmitəkæɹiənoɪliɹæɡlaɪkðæt",
205
- "per": 0.32142857142857145,
206
- "pwed": 0.12634408602150538
207
- },
208
- {
209
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
210
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
211
- "prediction": "hɪzkæptənwʌzθɪnænhæɡɚdændhɪzbjuːɾɪfʊbuːtswɚwoːnəndʃæbi",
212
- "per": 0.3617021276595745,
213
- "pwed": 0.13095238095238093
214
- },
215
- {
216
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
217
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
218
- "prediction": "ðəɹiːzənzfɚðəsdɑːvsiːmdfuːlɪʃnæ",
219
- "per": 0.3793103448275862,
220
- "pwed": 0.12068965517241376
221
- },
222
- {
223
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
224
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
225
- "prediction": "pɹədʌkʃənmeɪfɑːlfɑːbəloʊɛkspɛkteɪʃənz",
226
- "per": 0.36363636363636365,
227
- "pwed": 0.14404761904761906
228
- }
229
- ],
230
- "timestamp": "2024-12-05T10:34:26.154521"
231
- },
232
- {
233
- "task_id": "f36060e6-a746-44dc-a527-54995b270053",
234
- "model": "ginic/hyperparam_tuning_1_wav2vec2-large-xlsr-buckeye-ipa",
235
- "subset": "timit-test",
236
- "num_files": 1680,
237
- "average_per": 0.2766466385175833,
238
- "average_pwed": 0.10410683992600853,
239
- "detailed_results": [
240
- {
241
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
242
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
243
- "prediction": "ʃihædjɹ̩dɑɹksuɾɪnɡɹeɪsiwɑʃwɔɾɹ̩ɔljiɹ",
244
- "per": 0.24242424242424243,
245
- "pwed": 0.09926470588235292
246
- },
247
- {
248
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
249
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
250
- "prediction": "doʊndæskmidɪkæɹiɛnɔɪliɹæɡlaɪkðæʔ",
251
- "per": 0.32142857142857145,
252
- "pwed": 0.14192708333333334
253
- },
254
- {
255
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
256
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
257
- "prediction": "hɪzkæptɪnwʌzθɪnɛnhæɡɹ̩dɛnɪzbjuɾʌfl̩butswɹ̩wɔɹnɛnʃæbi",
258
- "per": 0.2553191489361702,
259
- "pwed": 0.05357142857142857
260
- },
261
- {
262
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
263
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
264
- "prediction": "ðʌɹizʌnzfɹ̩ðʌstaɪvsimdfulɪʃnaʊ",
265
- "per": 0.20689655172413793,
266
- "pwed": 0.01293103448275862
267
- },
268
- {
269
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
270
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
271
- "prediction": "pɹʌdʌkʃʌnmeɪfɔlfɑɹbʌloʊɛkspɛkteɪʃʌns",
272
- "per": 0.2727272727272727,
273
- "pwed": 0.10416666666666667
274
- }
275
- ],
276
- "timestamp": "2024-12-05T11:06:07.981224"
277
- },
278
- {
279
- "task_id": "47d56349-8111-4bda-a47f-e007dbedd36d",
280
- "model": "KoelLabs/xlsr-timit-a0",
281
- "subset": "timit-test",
282
- "num_files": 1680,
283
- "average_per": 0.24242141955346685,
284
- "average_pwed": 0.17395311976938,
285
- "detailed_results": [
286
- {
287
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
288
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
289
- "prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɑʃwɔɾɹʔɔljɪɹ",
290
- "per": 0.12121212121212122,
291
- "pwed": 0.037990196078431376
292
- },
293
- {
294
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
295
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
296
- "prediction": "ɪoʊnæskmitikæɹinɔɪliɹæɡlaɪkðt",
297
- "per": 0.21428571428571427,
298
- "pwed": 0.1695402298850575
299
- },
300
- {
301
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
302
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
303
- "prediction": "hɪzkæpinwəsθɪninhæɡɹdinhizbjuɾiflbutswɹwɔɹnintʃæbi",
304
- "per": 0.1276595744680851,
305
- "pwed": 0.06499999999999999
306
- },
307
- {
308
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
309
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
310
- "prediction": "ðəɹiznzfɹðistaɪ",
311
- "per": 0.5862068965517241,
312
- "pwed": 0.4899425287356322
313
- },
314
- {
315
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
316
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
317
- "prediction": "ɹidʌkʃinmeɪfɔlfɑɹbəloʊɛkspɛkeɪ",
318
- "per": 0.21212121212121213,
319
- "pwed": 0.1553030303030303
320
- }
321
- ],
322
- "timestamp": "2024-12-12T15:53:07.584096"
323
- },
324
- {
325
- "task_id": "51dd5735-63bd-4fe5-a588-c0fc079076e0",
326
- "model": "KoelLabs/xlsr-timit-a0",
327
- "subset": "timit-test",
328
- "num_files": 1680,
329
- "average_per": 0.24242141955346685,
330
- "average_pwed": 0.17395311976938,
331
- "detailed_results": [
332
- {
333
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
334
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
335
- "prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɑʃwɔɾɹʔɔljɪɹ",
336
- "per": 0.12121212121212122,
337
- "pwed": 0.037990196078431376
338
- },
339
- {
340
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
341
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
342
- "prediction": "ɪoʊnæskmitikæɹinɔɪliɹæɡlaɪkðt",
343
- "per": 0.21428571428571427,
344
- "pwed": 0.1695402298850575
345
- },
346
- {
347
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
348
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
349
- "prediction": "hɪzkæpinwəsθɪninhæɡɹdinhizbjuɾiflbutswɹwɔɹnintʃæbi",
350
- "per": 0.1276595744680851,
351
- "pwed": 0.06499999999999999
352
- },
353
- {
354
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
355
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
356
- "prediction": "ðəɹiznzfɹðistaɪ",
357
- "per": 0.5862068965517241,
358
- "pwed": 0.4899425287356322
359
- },
360
- {
361
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
362
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
363
- "prediction": "ɹidʌkʃinmeɪfɔlfɑɹbəloʊɛkspɛkeɪ",
364
- "per": 0.21212121212121213,
365
- "pwed": 0.1553030303030303
366
- }
367
- ],
368
- "timestamp": "2024-12-12T16:07:25.389475"
369
- },
370
- {
371
- "task_id": "2e592612-ca38-4afb-a6a0-3c870b288960",
372
- "model": "snu-nia-12/wav2vec2-large_nia12_phone-ipa_english",
373
- "subset": "timit-test",
374
- "num_files": 1680,
375
- "average_per": 0.4847029843149011,
376
- "average_pwed": 0.2072006544586948,
377
- "detailed_results": [
378
- {
379
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
380
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
381
- "prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrər",
382
- "per": 0.42424242424242425,
383
- "pwed": 0.15393518518518517
384
- },
385
- {
386
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
387
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
388
- "prediction": "doʊntæskmitɪkɛriənɔɪliræglaɪkðətdoʊndt",
389
- "per": 0.5,
390
- "pwed": 0.2623873873873874
391
- },
392
- {
393
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
394
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
395
- "prediction": "hɪzkæptənwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnəndʃæbiiii",
396
- "per": 0.46808510638297873,
397
- "pwed": 0.2191091954022989
398
- },
399
- {
400
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
401
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
402
- "prediction": "ðərizənzfərðɪstaɪvsimdfulɪʃnaʊ",
403
- "per": 0.20689655172413793,
404
- "pwed": 0.054166666666666675
405
- },
406
- {
407
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
408
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
409
- "prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzpzppppzpdtdtd",
410
- "per": 0.7272727272727273,
411
- "pwed": 0.34438775510204084
412
- }
413
- ],
414
- "timestamp": "2024-12-18T22:01:20.853274"
415
- },
416
- {
417
- "task_id": "d38e65ce-75b5-4dbf-8ade-bff6a5803790",
418
- "model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
419
- "subset": "timit-test",
420
- "num_files": 1680,
421
- "average_per": 0.2561961414705681,
422
- "average_pwed": 0.1378394393452702,
423
- "detailed_results": [
424
- {
425
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
426
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
427
- "prediction": "ʃihædjɝdɑɹksuɾɪngɹisiwɑʃwɑɾɝɑljiɝ",
428
- "per": 0.18181818181818182,
429
- "pwed": 0.13257575757575757
430
- },
431
- {
432
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
433
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
434
- "prediction": "doʊnæskmitɪkæɹiɪnɔɪliɹæglaɪkðæ",
435
- "per": 0.21428571428571427,
436
- "pwed": 0.10919540229885057
437
- },
438
- {
439
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
440
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
441
- "prediction": "hɪzkætɪnwəsθɪnənhægɝdɪnɪzbjuɾɪflbutswɝwɑɹnɪnʃæbi",
442
- "per": 0.19148936170212766,
443
- "pwed": 0.0576241134751773
444
- },
445
- {
446
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
447
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
448
- "prediction": "ðɪɹizənzfɝðɪsdaɪvsimdfulɪʃnaʊ",
449
- "per": 0.10344827586206896,
450
- "pwed": 0.03735632183908046
451
- },
452
- {
453
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
454
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
455
- "prediction": "pɹɝdəkʃɪnmeɪfɑlfɹbloʊɛkspɛteɪʃɪns",
456
- "per": 0.3333333333333333,
457
- "pwed": 0.12373737373737376
458
- }
459
- ],
460
- "timestamp": "2024-12-18T22:50:59.625872"
461
- },
462
- {
463
- "task_id": "2839c0c6-8f3b-426e-9eb7-04b6e133dc47",
464
- "model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa-plus-2000",
465
- "subset": "timit-test",
466
- "num_files": 1680,
467
- "average_per": 0.6479484324708775,
468
- "average_pwed": 0.18710002665151734,
469
- "detailed_results": [
470
- {
471
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
472
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
473
- "prediction": "ʂixadjodarksyːdɨnɡwisiwaːʃwarɒɔjiːr",
474
- "per": 0.6060606060606061,
475
- "pwed": 0.15404040404040406
476
- },
477
- {
478
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
479
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
480
- "prediction": "dondaːskmiːdɨkɛːɻjɒnojluiʋɻaːɡlɑjɡtaːn",
481
- "per": 0.8928571428571429,
482
- "pwed": 0.2146464646464646
483
- },
484
- {
485
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
486
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
487
- "prediction": "hizkaːptanustinanhagɛɻdɛnizbiurufubutswuɾʋoːɻninʂaːbi",
488
- "per": 0.5106382978723404,
489
- "pwed": 0.1096938775510204
490
- },
491
- {
492
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
493
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
494
- "prediction": "ðrisɔnsfrdɔsdaːjvsimtfulɛʂnɛ",
495
- "per": 0.5172413793103449,
496
- "pwed": 0.11063218390804598
497
- },
498
- {
499
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
500
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
501
- "prediction": "pɛdakɕɔnmɛjfaɔfarbuwɔwɛkspɛktajʂɔnt͡s",
502
- "per": 0.7272727272727273,
503
- "pwed": 0.15
504
- }
505
- ],
506
- "timestamp": "2024-12-18T23:29:27.320433"
507
- },
508
- {
509
- "task_id": "59afc37a-0072-44dd-a02a-0cf47d89c120",
510
- "model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
511
- "subset": "timit-test",
512
- "num_files": 1680,
513
- "average_per": 0.6417205190285036,
514
- "average_pwed": 0.19048963968896404,
515
- "detailed_results": [
516
- {
517
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
518
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
519
- "prediction": "ʂiharjoɖarksɯudenɡwisiwaːʂwarɔːjiːr",
520
- "per": 0.696969696969697,
521
- "pwed": 0.20580808080808083
522
- },
523
- {
524
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
525
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
526
- "prediction": "dɔndaːskmidɨkaːɻjɑno̞jwɯräːɡläikθaːn",
527
- "per": 0.8214285714285714,
528
- "pwed": 0.17338709677419356
529
- },
530
- {
531
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
532
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
533
- "prediction": "çizkatːɛnwɔstinanhaːɡɛɾdanɨzbirufubuswɔwoːɾnenʂaːbi",
534
- "per": 0.5531914893617021,
535
- "pwed": 0.1276595744680851
536
- },
537
- {
538
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
539
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
540
- "prediction": "ðɔriːzɔnsfɾdɔɕtaːivsimtfuøʃnɛu",
541
- "per": 0.5862068965517241,
542
- "pwed": 0.08764367816091957
543
- },
544
- {
545
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
546
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
547
- "prediction": "pɾɔdakʂɔnmɛjfaɔfaɾbuwɔuwɛkspɛktajʂons",
548
- "per": 0.7575757575757576,
549
- "pwed": 0.18806306306306303
550
- }
551
- ],
552
- "timestamp": "2024-12-19T07:41:18.132953"
553
- },
554
- {
555
- "task_id": "5517f6b2-6a76-4a2d-a6ce-33446f390c3b",
556
- "model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
557
- "subset": "timit-test",
558
- "num_files": 1680,
559
- "average_per": 0.2810165988557621,
560
- "average_pwed": 0.10703377161801164,
561
- "detailed_results": [
562
- {
563
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
564
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
565
- "prediction": "ʃihædjɹ̩dɑɹksudɪnɡɹisiwɑʃwɑɾɹ̩ɔljiɹ",
566
- "per": 0.18181818181818182,
567
- "pwed": 0.07196969696969698
568
- },
569
- {
570
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
571
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
572
- "prediction": "doʊndæskmitɪkæɹiʌnɔɪliɹæɡlaɪkðæʔ",
573
- "per": 0.2857142857142857,
574
- "pwed": 0.14062500000000003
575
- },
576
- {
577
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
578
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
579
- "prediction": "hɪzkæptʌnwʌzθɪnhæɡɹ̩dɛnɪzbjuɾʌfl̩butswɹ̩wɔʊɹnɪnʃæbi",
580
- "per": 0.2978723404255319,
581
- "pwed": 0.09114583333333333
582
- },
583
- {
584
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
585
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
586
- "prediction": "ðʌɹizʌnzfɹ̩ðʌstaɪvsimtfulɪʃnaʊ",
587
- "per": 0.2413793103448276,
588
- "pwed": 0.014367816091954023
589
- },
590
- {
591
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
592
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
593
- "prediction": "pɹʌdʌkʃʌnmeɪfɔlfɑɹbʌloʊɛkspɛkteɪʃʌnz",
594
- "per": 0.30303030303030304,
595
- "pwed": 0.10532407407407407
596
- }
597
- ],
598
- "timestamp": "2024-12-20T13:45:52.009233"
599
- },
600
- {
601
- "task_id": "c2139f96-e79e-4f25-a525-aa039f65555f",
602
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.9.2WithoutSpaces",
603
- "subset": "timit-test",
604
- "num_files": 1680,
605
- "average_per": 0.9537775908999574,
606
- "average_pwed": 0.9351204819224959,
607
- "detailed_results": [
608
- {
609
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
610
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
611
- "prediction": "iɛ2",
612
- "per": 0.9696969696969697,
613
- "pwed": 0.9406565656565656
614
- },
615
- {
616
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
617
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
618
- "prediction": "iɛ2",
619
- "per": 0.9285714285714286,
620
- "pwed": 0.9285714285714286
621
- },
622
- {
623
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
624
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
625
- "prediction": "iɛ2",
626
- "per": 0.9787234042553191,
627
- "pwed": 0.9583333333333333
628
- },
629
- {
630
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
631
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
632
- "prediction": "iɛ2",
633
- "per": 0.9655172413793104,
634
- "pwed": 0.932471264367816
635
- },
636
- {
637
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
638
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
639
- "prediction": "iɛ2",
640
- "per": 0.9696969696969697,
641
- "pwed": 0.9406565656565656
642
- }
643
- ],
644
- "timestamp": "2024-12-20T14:21:32.290889"
645
- },
646
- {
647
- "task_id": "d146f1f1-6e6e-4b28-9420-c652ae9a1002",
648
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
649
- "subset": "timit-test",
650
- "num_files": 1680,
651
- "average_per": 0.9887075544197294,
652
- "average_pwed": 0.9692486915717254,
653
- "detailed_results": [
654
- {
655
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
656
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
657
- "prediction": "p",
658
- "per": 1.0,
659
- "pwed": 0.9747474747474747
660
- },
661
- {
662
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
663
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
664
- "prediction": "p",
665
- "per": 1.0,
666
- "pwed": 0.96875
667
- },
668
- {
669
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
670
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
671
- "prediction": "p",
672
- "per": 0.9787234042553191,
673
- "pwed": 0.9787234042553191
674
- },
675
- {
676
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
677
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
678
- "prediction": "p",
679
- "per": 1.0,
680
- "pwed": 0.9683908045977011
681
- },
682
- {
683
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
684
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
685
- "prediction": "p",
686
- "per": 0.9696969696969697,
687
- "pwed": 0.9696969696969697
688
- }
689
- ],
690
- "timestamp": "2024-12-20T15:26:27.658798"
691
- },
692
- {
693
- "task_id": "265c5859-e7ba-492d-a6c9-45733dc17c99",
694
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
695
- "subset": "timit-test",
696
- "num_files": 1680,
697
- "average_per": 0.9887075544197294,
698
- "average_pwed": 0.9692486915717254,
699
- "detailed_results": [
700
- {
701
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
702
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
703
- "prediction": "p",
704
- "per": 1.0,
705
- "pwed": 0.9747474747474747
706
- },
707
- {
708
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
709
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
710
- "prediction": "p",
711
- "per": 1.0,
712
- "pwed": 0.96875
713
- },
714
- {
715
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
716
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
717
- "prediction": "p",
718
- "per": 0.9787234042553191,
719
- "pwed": 0.9787234042553191
720
- },
721
- {
722
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
723
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
724
- "prediction": "p",
725
- "per": 1.0,
726
- "pwed": 0.9683908045977011
727
- },
728
- {
729
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
730
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
731
- "prediction": "p",
732
- "per": 0.9696969696969697,
733
- "pwed": 0.9696969696969697
734
- }
735
- ],
736
- "timestamp": "2024-12-20T15:40:51.631218"
737
- },
738
- {
739
- "task_id": "e297dfde-95e5-462b-a6e5-8fa43bc30bc0",
740
- "model": "speech31/wavlm-large-english-ipa",
741
- "subset": "timit-test",
742
- "num_files": 1680,
743
- "average_per": 0.3694017596969614,
744
- "average_pwed": 0.1356824900612308,
745
- "detailed_results": [
746
- {
747
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
748
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
749
- "prediction": "ʃihædjɔɹdɑɹksutɪnɡɹisiwɑʃwɔtɹ̩ɔljɪɹ",
750
- "per": 0.2727272727272727,
751
- "pwed": 0.11274509803921567
752
- },
753
- {
754
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
755
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
756
- "prediction": "dɑntæskmitəkæɹiænojliɹæɡlajkðæt",
757
- "per": 0.39285714285714285,
758
- "pwed": 0.13575268817204303
759
- },
760
- {
761
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
762
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
763
- "prediction": "hɪzkæpptənwɑzθɪændhæɡɹ̩dænhɪzbjutəfəlbutswɹ̩wɔɹnɪnʃæbi",
764
- "per": 0.3404255319148936,
765
- "pwed": 0.12980769230769232
766
- },
767
- {
768
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
769
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
770
- "prediction": "ðəɹizənzfɔɹðəsdajvsimdfulɪʃnaw",
771
- "per": 0.20689655172413793,
772
- "pwed": 0.051388888888888894
773
- },
774
- {
775
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
776
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
777
- "prediction": "pɹədʌkʃənmejffɔlfɔɑɹbɪlowɪkspɛktejʃənz",
778
- "per": 0.45454545454545453,
779
- "pwed": 0.16666666666666666
780
- }
781
- ],
782
- "timestamp": "2024-12-20T16:13:24.050232"
783
- },
784
- {
785
- "task_id": "efe95f71-05e3-485d-8e0c-1823a3037cf4",
786
- "model": "speech31/wavlm-large-english-ipa",
787
- "subset": "timit-test",
788
- "num_files": 1680,
789
- "average_per": 0.3694017596969614,
790
- "average_pwed": 0.1356824900612308,
791
- "detailed_results": [
792
- {
793
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
794
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
795
- "prediction": "ʃihædjɔɹdɑɹksutɪnɡɹisiwɑʃwɔtɹ̩ɔljɪɹ",
796
- "per": 0.2727272727272727,
797
- "pwed": 0.11274509803921567
798
- },
799
- {
800
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
801
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
802
- "prediction": "dɑntæskmitəkæɹiænojliɹæɡlajkðæt",
803
- "per": 0.39285714285714285,
804
- "pwed": 0.13575268817204303
805
- },
806
- {
807
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
808
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
809
- "prediction": "hɪzkæpptənwɑzθɪændhæɡɹ̩dænhɪzbjutəfəlbutswɹ̩wɔɹnɪnʃæbi",
810
- "per": 0.3404255319148936,
811
- "pwed": 0.12980769230769232
812
- },
813
- {
814
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
815
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
816
- "prediction": "ðəɹizənzfɔɹðəsdajvsimdfulɪʃnaw",
817
- "per": 0.20689655172413793,
818
- "pwed": 0.051388888888888894
819
- },
820
- {
821
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
822
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
823
- "prediction": "pɹədʌkʃənmejffɔlfɔɑɹbɪlowɪkspɛktejʃənz",
824
- "per": 0.45454545454545453,
825
- "pwed": 0.16666666666666666
826
- }
827
- ],
828
- "timestamp": "2024-12-20T16:26:47.980084"
829
- },
830
- {
831
- "task_id": "4b2ae2fc-fe2f-4f8b-9e8f-25c0bae13c0d",
832
- "model": "speech31/XLS-R-300m-english-ipa",
833
- "subset": "timit-test",
834
- "num_files": 1680,
835
- "average_per": 0.36382554692045954,
836
- "average_pwed": 0.1299702312124616,
837
- "detailed_results": [
838
- {
839
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
840
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
841
- "prediction": "ʃihædjɔɹdɑɹksutɪnɡɹisiwɑʃwɔtɹ̩ɔljɪɹ",
842
- "per": 0.2727272727272727,
843
- "pwed": 0.11274509803921567
844
- },
845
- {
846
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
847
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
848
- "prediction": "dɑntæskmitəkæɹiænojliɹæɡlajkðæt",
849
- "per": 0.39285714285714285,
850
- "pwed": 0.13575268817204303
851
- },
852
- {
853
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
854
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
855
- "prediction": "hɪzkæmptənwɑzθɪnændhæɡɹ̩dɪndhɪzbjutəfəlbutswɹ̩wɔɹnɪnʃæbi",
856
- "per": 0.3404255319148936,
857
- "pwed": 0.14583333333333334
858
- },
859
- {
860
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
861
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
862
- "prediction": "ðəɹɛzənzfɔɹðɪstajvsimdfulɪʃnaw",
863
- "per": 0.2413793103448276,
864
- "pwed": 0.052777777777777785
865
- },
866
- {
867
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
868
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
869
- "prediction": "pɹədʌkʃənmejfɔlfɑɹbɪlowɛkspɛktejʃənz",
870
- "per": 0.3939393939393939,
871
- "pwed": 0.11921296296296297
872
- }
873
- ],
874
- "timestamp": "2024-12-20T16:47:54.824174"
875
- },
876
- {
877
- "task_id": "33d387c0-703c-415d-b8e2-81cea87a2146",
878
- "model": "speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
879
- "subset": "timit-test",
880
- "num_files": 1680,
881
- "average_per": 0.44563344149564776,
882
- "average_pwed": 0.18844914029048124,
883
- "detailed_results": [
884
- {
885
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
886
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
887
- "prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrr",
888
- "per": 0.3939393939393939,
889
- "pwed": 0.12976190476190474
890
- },
891
- {
892
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
893
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
894
- "prediction": "doʊntæskmitɪkɛriənɔɪliræglaɪkðətdnt",
895
- "per": 0.39285714285714285,
896
- "pwed": 0.19730392156862747
897
- },
898
- {
899
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
900
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
901
- "prediction": "hɪzkæptənwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnɪnʃæbibæb",
902
- "per": 0.44680851063829785,
903
- "pwed": 0.20394736842105265
904
- },
905
- {
906
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
907
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
908
- "prediction": "ðərizənzfərðɪsstaɪvsimdfulɪʃnaʊa",
909
- "per": 0.27586206896551724,
910
- "pwed": 0.11328125
911
- },
912
- {
913
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
914
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
915
- "prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzd",
916
- "per": 0.3939393939393939,
917
- "pwed": 0.13626126126126126
918
- }
919
- ],
920
- "timestamp": "2024-12-20T17:05:35.210786"
921
- },
922
- {
923
- "task_id": "c89bcefc-3884-435a-a54c-24297fe6f041",
924
- "model": "speech31/wav2vec2-large-TIMIT-IPA2",
925
- "subset": "timit-test",
926
- "num_files": 1680,
927
- "average_per": 0.4847029843149011,
928
- "average_pwed": 0.2072006544586948,
929
- "detailed_results": [
930
- {
931
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
932
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
933
- "prediction": "ʃihædjʊrdɑrksutɪngrisiwɑʃwɔtərɔljɪrər",
934
- "per": 0.42424242424242425,
935
- "pwed": 0.15393518518518517
936
- },
937
- {
938
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
939
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
940
- "prediction": "doʊntæskmitɪkɛriənɔɪliræglaɪkðətdoʊndt",
941
- "per": 0.5,
942
- "pwed": 0.2623873873873874
943
- },
944
- {
945
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
946
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
947
- "prediction": "hɪzkæptənwɑzθɪnəndhægərdəndhɪzbjutəfəlbutswərwɔrnəndʃæbiiii",
948
- "per": 0.46808510638297873,
949
- "pwed": 0.2191091954022989
950
- },
951
- {
952
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
953
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
954
- "prediction": "ðərizənzfərðɪstaɪvsimdfulɪʃnaʊ",
955
- "per": 0.20689655172413793,
956
- "pwed": 0.054166666666666675
957
- },
958
- {
959
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
960
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
961
- "prediction": "prədəkʃənmeɪfɔlfɑrbɪloʊɛkspɛkteɪʃənzpzppppzpdtdtd",
962
- "per": 0.7272727272727273,
963
- "pwed": 0.34438775510204084
964
- }
965
- ],
966
- "timestamp": "2024-12-20T22:50:50.641790"
967
- },
968
- {
969
- "task_id": "81fa94f8-94ae-4601-952c-24abaddaf691",
970
- "model": "ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
971
- "subset": "timit-test",
972
- "num_files": 1680,
973
- "average_per": 0.2807914104790719,
974
- "average_pwed": 0.10494355278037441,
975
- "detailed_results": [
976
- {
977
- "file": "data/TEST/DR1/FAKS0/SA1.WAV",
978
- "ground_truth": "ʃihædjɹdɑɹksuɾɪŋgɹisiwɑʃwɑɾɹʔɔljiɹ",
979
- "prediction": "ʃihædjɹdɑɹksuɾɪnɡɹisiwɔʃwɔɾɹ̩ɔljiɹ",
980
- "per": 0.18181818181818182,
981
- "pwed": 0.0744949494949495
982
- },
983
- {
984
- "file": "data/TEST/DR1/FAKS0/SA2.WAV",
985
- "ground_truth": "oʊnæsmitikɛɹiinɔɪliɹæglaɪkðæt",
986
- "prediction": "doʊndæskmidɪkæɹiɪnɔɪliɹæɡlaɪkðæʔ",
987
- "per": 0.32142857142857145,
988
- "pwed": 0.140625
989
- },
990
- {
991
- "file": "data/TEST/DR1/FAKS0/SI1573.WAV",
992
- "ground_truth": "hɪzkæpinwəsθɪnænhægɹdinɪzbjuɾuflbutswɹwɔɹninʃæbi",
993
- "prediction": "hɪzkæptʌnwʌzθɪnɛnhæɡɹ̩dɛnɪzbjuɾʌfl̩butswɹ̩wɔɹnɪnʃæbi",
994
- "per": 0.2553191489361702,
995
- "pwed": 0.05357142857142856
996
- },
997
- {
998
- "file": "data/TEST/DR1/FAKS0/SI2203.WAV",
999
- "ground_truth": "ðiɹizənzfɹðɪsdaɪvsimdfuliʃnaʊ",
1000
- "prediction": "ðʌɹizʌn̩zfɹðʌstaɪvsimtfulɪʃnaʊ",
1001
- "per": 0.2413793103448276,
1002
- "pwed": 0.014367816091954023
1003
- },
1004
- {
1005
- "file": "data/TEST/DR1/FAKS0/SI943.WAV",
1006
- "ground_truth": "ɹdʌkʃinmeɪfɔlfɑɹbəloʊəkspikeɪʃnts",
1007
- "prediction": "pɹʌdʌkʃn̩meɪfɔlfɑɹbʌloʊɛkspɛkteɪʃʌns",
1008
- "per": 0.30303030303030304,
1009
- "pwed": 0.12023809523809523
1010
- }
1011
- ],
1012
- "timestamp": "2024-12-21T01:31:04.859070"
1013
- }
1014
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/queue/tasks.json DELETED
@@ -1,237 +0,0 @@
1
- [
2
- {
3
- "id": "721b4c64-a825-42d3-bb0a-bdff9ee1ed0f",
4
- "model": "facebook/wav2vec2-lv-60-espeak-cv-ft",
5
- "subset": "timit-test",
6
- "submission_name": "facebook espeak",
7
- "github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
8
- "status": "completed",
9
- "submitted_at": "2024-12-05T07:19:03.076292"
10
- },
11
- {
12
- "id": "d6fe0956-b5b4-4105-835e-8dee1872ee4d",
13
- "model": "KoelLabs/xlsr-timit-b0",
14
- "subset": "timit-test",
15
- "submission_name": "english phoneme model",
16
- "github_url": "https://github.com/KoelLabs/",
17
- "status": "completed",
18
- "submitted_at": "2024-12-05T08:12:40.161444"
19
- },
20
- {
21
- "id": "dbf4642a-fb13-402c-8a74-cc41fc4be599",
22
- "model": "speech31/wav2vec2-large-TIMIT-IPA",
23
- "subset": "timit-test",
24
- "submission_name": "speech 31 model",
25
- "github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
26
- "status": "completed",
27
- "submitted_at": "2024-12-05T09:13:45.315361"
28
- },
29
- {
30
- "id": "4e3b80be-b255-47f2-b4ae-18a12e232e8a",
31
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
32
- "subset": "timit-test",
33
- "submission_name": "Jubliano model",
34
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces/tree/d5312009d8e620b183c334dfdd9ffc6b4f06f8c1",
35
- "status": "processing",
36
- "submitted_at": "2024-12-05T09:36:14.571930"
37
- },
38
- {
39
- "id": "912449a4-d7ed-4af4-b5be-5c2c57ec09ff",
40
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5",
41
- "subset": "timit-test",
42
- "submission_name": "jubiliano model wav2vec2",
43
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces/tree/d5312009d8e620b183c334dfdd9ffc6b4f06f8c1",
44
- "status": "completed",
45
- "submitted_at": "2024-12-05T10:01:40.502935"
46
- },
47
- {
48
- "id": "c79df17e-2bb2-4253-ae26-f7cc6ab21265",
49
- "model": "facebook/wav2vec2-xlsr-53-espeak-cv-ft",
50
- "subset": "timit-test",
51
- "submission_name": "xlsr 53 model",
52
- "github_url": "https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md",
53
- "status": "completed",
54
- "submitted_at": "2024-12-05T10:18:37.408664"
55
- },
56
- {
57
- "id": "f36060e6-a746-44dc-a527-54995b270053",
58
- "model": "ginic/hyperparam_tuning_1_wav2vec2-large-xlsr-buckeye-ipa",
59
- "subset": "timit-test",
60
- "submission_name": "ginic model wav2vec2 finetuned on buckeye",
61
- "github_url": "https://huggingface.co/ginic/vary_individuals_old_only_1_wav2vec2-large-xlsr-buckeye-ipa",
62
- "status": "completed",
63
- "submitted_at": "2024-12-05T10:36:02.340422"
64
- },
65
- {
66
- "id": "abf6c247-9faf-46ef-b0fa-25f2669da922",
67
- "model": "KoelLabs/xlsr-timit-a0",
68
- "subset": "timit-test",
69
- "submission_name": "Koel Labs early version of finetuned model ",
70
- "github_url": "https://github.com/KoelLabs/ML",
71
- "status": "processing",
72
- "submitted_at": "2024-12-05T11:08:23.663553"
73
- },
74
- {
75
- "id": "47d56349-8111-4bda-a47f-e007dbedd36d",
76
- "model": "KoelLabs/xlsr-timit-a0",
77
- "subset": "timit-test",
78
- "submission_name": "koel labs initial ",
79
- "github_url": "https://github.com/KoelLabs/ML/",
80
- "status": "completed",
81
- "submitted_at": "2024-12-12T15:28:12.923626"
82
- },
83
- {
84
- "id": "51dd5735-63bd-4fe5-a588-c0fc079076e0",
85
- "model": "KoelLabs/xlsr-timit-a0",
86
- "subset": "timit-test",
87
- "submission_name": "koel labs initial ",
88
- "github_url": "https://github.com/KoelLabs/ML/",
89
- "status": "completed",
90
- "submitted_at": "2024-12-12T15:53:07.620070"
91
- },
92
- {
93
- "id": "2e592612-ca38-4afb-a6a0-3c870b288960",
94
- "model": "snu-nia-12/wav2vec2-large_nia12_phone-ipa_english",
95
- "subset": "timit-test",
96
- "submission_name": "wav2vec2 ipa eng ",
97
- "github_url": "",
98
- "status": "completed",
99
- "submitted_at": "2024-12-18T21:41:21.861322"
100
- },
101
- {
102
- "id": "ac4cbe86-4dbe-4929-8f76-4d2052e0acf1",
103
- "model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
104
- "subset": "timit-test",
105
- "submission_name": "fine-tuned version of facebook/wav2vec2-xls-r-300m on the Timit dataset",
106
- "github_url": "https://www.kaggle.com/code/vitouphy/phoneme-recognition-with-wav2vec2",
107
- "status": "processing",
108
- "submitted_at": "2024-12-18T22:09:03.412372"
109
- },
110
- {
111
- "id": "d38e65ce-75b5-4dbf-8ade-bff6a5803790",
112
- "model": "vitouphy/wav2vec2-xls-r-300m-timit-phoneme",
113
- "subset": "timit-test",
114
- "submission_name": "fine-tuned version of facebook/wav2vec2-xls-r-300m on the Timit dataset",
115
- "github_url": "https://www.kaggle.com/code/vitouphy/phoneme-recognition-with-wav2vec2",
116
- "status": "completed",
117
- "submitted_at": "2024-12-18T22:19:46.817373"
118
- },
119
- {
120
- "id": "2839c0c6-8f3b-426e-9eb7-04b6e133dc47",
121
- "model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa-plus-2000",
122
- "subset": "timit-test",
123
- "submission_name": "wav2vec2 model",
124
- "github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
125
- "status": "completed",
126
- "submitted_at": "2024-12-18T22:55:36.734691"
127
- },
128
- {
129
- "id": "59afc37a-0072-44dd-a02a-0cf47d89c120",
130
- "model": "ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
131
- "subset": "timit-test",
132
- "submission_name": "wav2vec2 non-english transcription",
133
- "github_url": "https://huggingface.co/ctaguchi/wav2vec2-large-xlsr-japlmthufielta-ipa1000-ns",
134
- "status": "completed",
135
- "submitted_at": "2024-12-18T23:47:03.488337"
136
- },
137
- {
138
- "id": "e57eda9d-7a1d-4b41-9d47-a3d3839cac8b",
139
- "model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
140
- "subset": "timit-test",
141
- "submission_name": "phonetic transcription with the Buckeye corpus, from xlsr-53 model ",
142
- "github_url": "https://github.com/ginic/multipa/tree/buckeye_experiments",
143
- "status": "failed",
144
- "submitted_at": "2024-12-19T11:48:26.415322",
145
- "error": "Evaluation failed: (MaxRetryError(\"HTTPSConnectionPool(host='cdn-lfs-us-1.hf.co', port=443): Max retries exceeded with url: /repos/a4/b1/a4b11f4627350048e021a84d10b89320db54e02c54b2a9366228f8a05cda220b/120f5bc04d1df15143033c93e3ef358981775b529f17e0db11e58a1b80754e67?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&Expires=1734889736&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczNDg4OTczNn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zL2E0L2IxL2E0YjExZjQ2MjczNTAwNDhlMDIxYTg0ZDEwYjg5MzIwZGI1NGUwMmM1NGIyYTkzNjYyMjhmOGEwNWNkYTIyMGIvMTIwZjViYzA0ZDFkZjE1MTQzMDMzYzkzZTNlZjM1ODk4MTc3NWI1MjlmMTdlMGRiMTFlNThhMWI4MDc1NGU2Nz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=kfPD6ymEJuVvFZyuN3qL3xk4YJlpI5dqHgON4wJY-Mppwlp6x4Dw7cWdjEkJvMRF-bDuzNWQ3BEJPbsYouVW9WZMucDmxo38UwxSzIBhfWQxCYiHdUWuQPkypDUkI1mR3vbnCFQFXLiMQ2CgwWQz7q66OjIyq3suA00mhL2WcL8wvtovrfoEOkboEXCHCNLprfpoHpfoyfo~VS9~kmm61GN6SWbc9lzASIuT5FLkn~BJ6h405MgutQpNvrR4SHVLftk7rBmY8TAB3re5D0-9qFrMYb2Tk~9RKT3nxSNbgZVcEXzA5rYskcuGsrHoTuTTZ-NSW69K2M0IeivzFWTLNQ__&Key-Pair-Id=K24J24Z295AEI9 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x280544190>: Failed to establish a new connection: [Errno 51] Network is unreachable'))\"), '(Request ID: 14c9cc7c-47ee-47ae-b473-f4add807d233)')"
146
- },
147
- {
148
- "id": "5517f6b2-6a76-4a2d-a6ce-33446f390c3b",
149
- "model": "ginic/gender_split_70_female_4_wav2vec2-large-xlsr-buckeye-ipa",
150
- "subset": "timit-test",
151
- "submission_name": "phonetic transcription with the Buckeye corpus, from xlsr-53 model",
152
- "github_url": "https://github.com/ginic/multipa/tree/buckeye_experiments",
153
- "status": "completed",
154
- "submitted_at": "2024-12-20T13:29:37.327317"
155
- },
156
- {
157
- "id": "c2139f96-e79e-4f25-a525-aa039f65555f",
158
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.9.2WithoutSpaces",
159
- "subset": "timit-test",
160
- "submission_name": "phonetic transcription",
161
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-INTERNATIONAL1.5WithoutSpaces",
162
- "status": "completed",
163
- "submitted_at": "2024-12-20T14:01:35.626112"
164
- },
165
- {
166
- "id": "d146f1f1-6e6e-4b28-9420-c652ae9a1002",
167
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
168
- "subset": "timit-test",
169
- "submission_name": "Jubliano xlsr model",
170
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-nl1.1",
171
- "status": "completed",
172
- "submitted_at": "2024-12-20T15:08:45.949389"
173
- },
174
- {
175
- "id": "265c5859-e7ba-492d-a6c9-45733dc17c99",
176
- "model": "Jubliano/wav2vec2-large-xls-r-300m-ipa-nl",
177
- "subset": "timit-test",
178
- "submission_name": "Jubliano xlsr model",
179
- "github_url": "https://huggingface.co/Jubliano/wav2vec2-large-xls-r-300m-ipa-nl1.1",
180
- "status": "completed",
181
- "submitted_at": "2024-12-20T15:26:27.706187"
182
- },
183
- {
184
- "id": "e297dfde-95e5-462b-a6e5-8fa43bc30bc0",
185
- "model": "speech31/wavlm-large-english-ipa",
186
- "subset": "timit-test",
187
- "submission_name": "speech31 phoneme transcription english",
188
- "github_url": "https://huggingface.co/speech31/wavlm-large-english-ipa",
189
- "status": "completed",
190
- "submitted_at": "2024-12-20T15:56:25.445806"
191
- },
192
- {
193
- "id": "efe95f71-05e3-485d-8e0c-1823a3037cf4",
194
- "model": "speech31/wavlm-large-english-ipa",
195
- "subset": "timit-test",
196
- "submission_name": "speech31 phoneme transcription english",
197
- "github_url": "https://huggingface.co/speech31/wavlm-large-english-ipa",
198
- "status": "completed",
199
- "submitted_at": "2024-12-20T16:13:24.099308"
200
- },
201
- {
202
- "id": "4b2ae2fc-fe2f-4f8b-9e8f-25c0bae13c0d",
203
- "model": "speech31/XLS-R-300m-english-ipa",
204
- "subset": "timit-test",
205
- "submission_name": "speech31 xlsr model",
206
- "github_url": "https://huggingface.co/speech31/XLS-R-300m-english-ipa",
207
- "status": "completed",
208
- "submitted_at": "2024-12-20T16:33:23.864360"
209
- },
210
- {
211
- "id": "33d387c0-703c-415d-b8e2-81cea87a2146",
212
- "model": "speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
213
- "subset": "timit-test",
214
- "submission_name": "model is a fine-tuned version of facebook/wav2vec2-large on the TIMIT dataset",
215
- "github_url": "https://huggingface.co/speech31/wav2vec2-large-english-TIMIT-phoneme_v3",
216
- "status": "completed",
217
- "submitted_at": "2024-12-20T16:52:07.883839"
218
- },
219
- {
220
- "id": "c89bcefc-3884-435a-a54c-24297fe6f041",
221
- "model": "speech31/wav2vec2-large-TIMIT-IPA2",
222
- "subset": "timit-test",
223
- "submission_name": "fine-tuned version of facebook/wav2vec2-large on the None dataset",
224
- "github_url": "https://huggingface.co/speech31/wav2vec2-large-TIMIT-IPA2",
225
- "status": "completed",
226
- "submitted_at": "2024-12-20T21:54:38.559569"
227
- },
228
- {
229
- "id": "81fa94f8-94ae-4601-952c-24abaddaf691",
230
- "model": "ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
231
- "subset": "timit-test",
232
- "submission_name": "ginic model, facebook/wav2vec2-large-xlsr-53 fine tuned",
233
- "github_url": "https://huggingface.co/ginic/vary_individuals_young_only_3_wav2vec2-large-xlsr-buckeye-ipa",
234
- "status": "completed",
235
- "submitted_at": "2024-12-21T01:15:41.870875"
236
- }
237
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/tasks.py CHANGED
@@ -1,224 +1,117 @@
1
- # This modules handles the task queue, results, and leaderboard storage.
2
 
3
- import json
4
- import uuid
 
5
  from datetime import datetime
6
- from pathlib import Path
7
- from typing import Optional
8
 
9
- import asyncio
10
- import pandas as pd
11
 
12
- from inference import evaluate_model
 
 
 
13
 
14
- # Get absolute path
15
- CURRENT_DIR = Path(__file__).parent.absolute()
16
 
17
- # Constants
18
- QUEUE_DIR = CURRENT_DIR / "queue"
19
- PATHS = {
20
- "tasks": QUEUE_DIR / "tasks.json",
21
- "results": QUEUE_DIR / "results.json",
22
- "leaderboard": QUEUE_DIR / "leaderboard.json",
23
- }
24
 
 
 
 
 
 
 
 
 
 
25
 
26
- # Handle storing and loading data from JSON files
27
- class StorageManager:
28
- """Handles all JSON storage operations"""
29
 
30
- def __init__(self, paths: dict[str, Path]):
31
- self.paths = paths
32
- self._ensure_directories()
33
 
34
- def _ensure_directories(self):
35
- """Ensure all necessary directories and files exist"""
36
- for path in self.paths.values():
37
- path.parent.mkdir(parents=True, exist_ok=True)
38
- if not path.exists():
39
- path.write_text("[]")
40
-
41
- def load(self, key: str) -> list:
42
- """Load JSON file"""
43
- return json.loads(self.paths[key].read_text())
44
-
45
- def save(self, key: str, data: list):
46
- """Save data to JSON file"""
47
- self.paths[key].write_text(
48
- json.dumps(data, indent=4, default=str, ensure_ascii=False)
49
- )
50
-
51
- def update_task(self, task_id: str, updates: dict):
52
- """Update specific task with new data"""
53
- tasks = self.load("tasks")
54
- for task in tasks:
55
- if task["id"] == task_id:
56
- task.update(updates)
57
- break
58
- self.save("tasks", tasks)
59
-
60
-
61
- # Initialize storage manager
62
- storage_manager = StorageManager(PATHS)
63
 
 
 
64
 
65
- # Export external functions
66
- def get_leaderboard_data():
67
- """Return leaderboard data as DataFrame"""
68
- try:
69
- return pd.DataFrame(storage_manager.load("leaderboard"))
70
- except Exception as e:
71
- print(f"Error loading leaderboard: {e}")
72
- return pd.DataFrame()
73
 
 
 
 
74
 
75
- def get_results():
76
- """Return list of evaluation results"""
77
- return storage_manager.load("results")
78
 
79
 
80
- def get_tasks():
81
- """Return list of tasks"""
82
- return storage_manager.load("tasks")
83
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
- def get_status(query: str) -> dict:
86
- """Check status of a model evaluation task_id or model_name"""
87
- if not query:
88
- return {"error": "Please enter a model name or task ID"}
 
 
89
 
90
- try:
91
- results = get_results()
92
- tasks = get_tasks()
93
-
94
- # First try to find by task ID
95
- result = next((r for r in results if r["task_id"] == query), None)
96
- task = next((t for t in tasks if t["id"] == query), None)
97
-
98
- # If not found, try to find by model name
99
- if not result:
100
- result = next((r for r in results if r["model"] == query), None)
101
- if not task:
102
- task = next((t for t in tasks if t["model"] == query), None)
103
-
104
- if result:
105
- # If we found results, return them
106
- return {
107
- "status": "completed",
108
- "model": result["model"],
109
- "subset": result["subset"],
110
- "num_files": result["num_files"],
111
- "average_per": result["average_per"],
112
- "average_pwed": result["average_pwed"],
113
- "detailed_results": result["detailed_results"],
114
- "timestamp": result["timestamp"],
115
- }
116
- elif task:
117
- # If we only found task status, return that
118
- return task
119
- else:
120
- return {"error": f"No results found for '{query}'"}
121
 
122
- except Exception as e:
123
- print(f"Error checking status: {e}")
124
- return {"error": f"Error checking status: {str(e)}"}
125
 
 
126
 
127
- def start_eval_task(
128
- model_name: str, submission_name: str, github_url: Optional[str] = None
129
- ) -> str:
130
- """Start evaluation task in background. Returns task ID that can be used to check status."""
131
 
132
- # Generate a task ID
133
- task_id = str(uuid.uuid4())
134
-
135
- # Create task entry
136
- task = {
137
- "id": task_id,
138
- "model": model_name,
139
- "subset": "test",
140
- "submission_name": submission_name,
141
- "github_url": github_url,
142
- "status": "queued",
143
- "submitted_at": datetime.now().isoformat(),
144
- }
145
-
146
- # Save task
147
- tasks = storage_manager.load("tasks")
148
- tasks.append(task)
149
- storage_manager.save("tasks", tasks)
150
-
151
- # Start evaluation in background
152
- asyncio.run(_eval_task(task_id, model_name, submission_name, "test", github_url))
153
-
154
- return task_id
155
-
156
-
157
- async def _eval_task(
158
- task_id: str,
159
- model_name: str,
160
- submission_name: str,
161
- subset: str = "test",
162
- github_url: Optional[str] = None,
163
- max_samples: Optional[int] = None,
164
- ):
165
  """Background task to evaluate model and save updated results"""
166
  try:
167
  # Indicate task is processing
168
- storage_manager.update_task(task_id, {"status": "processing"})
169
 
170
  # Evaluate model
171
- result = evaluate_model(model_name, subset, max_samples)
172
- avg_per = result["average_per"]
173
- avg_pwed = result["average_pwed"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
 
175
  # Save results
176
- print("Saving results...")
177
- current_results = storage_manager.load("results")
178
- current_results.append(result)
179
- storage_manager.save("results", current_results)
180
-
181
- # Update leaderboard
182
- print("Updating leaderboard...")
183
- leaderboard = storage_manager.load("leaderboard")
184
- entry = next(
185
- (e for e in leaderboard if e["submission_name"] == submission_name),
186
- None,
187
- )
188
-
189
- if entry:
190
- # Simply update with new scores
191
- entry.update(
192
- {
193
- "task_id": task_id,
194
- "average_per": avg_per,
195
- "average_pwed": avg_pwed,
196
- "model": model_name,
197
- "subset": subset,
198
- "github_url": github_url,
199
- "submission_date": datetime.now().isoformat(),
200
- }
201
  )
202
- else:
203
- leaderboard.append(
204
- {
205
- "task_id": task_id,
206
- "submission_id": str(uuid.uuid4()),
207
- "submission_name": submission_name,
208
- "model": model_name,
209
- "average_per": avg_per,
210
- "average_pwed": avg_pwed,
211
- "subset": subset,
212
- "github_url": github_url,
213
- "submission_date": datetime.now().isoformat(),
214
- }
215
- )
216
-
217
- storage_manager.save("leaderboard", leaderboard)
218
- storage_manager.update_task(task_id, {"status": "completed"})
219
- print("Evaluation completed successfully")
220
 
 
 
221
  except Exception as e:
222
- error_msg = f"Evaluation failed: {str(e)}"
223
- print(error_msg)
224
- storage_manager.update_task(task_id, {"status": "failed", "error": error_msg})
 
1
+ # This modules handles the task queue
2
 
3
+ import os
4
+ import multiprocessing
5
+ from typing import TypedDict
6
  from datetime import datetime
 
 
7
 
 
 
8
 
9
+ from metrics import per, fer
10
+ from datasets import load_from_disk
11
+ from hf import get_repo_info, add_leaderboard_entry
12
+ from inference import clear_cache, load_model, transcribe
13
 
14
+ leaderboard_lock = multiprocessing.Lock()
 
15
 
 
 
 
 
 
 
 
16
 
17
+ class Task(TypedDict):
18
+ status: str
19
+ display_name: str
20
+ repo_id: str
21
+ repo_hash: str
22
+ repo_last_modified: datetime
23
+ submission_timestamp: datetime
24
+ url: str
25
+ error: str | None
26
 
 
 
 
27
 
28
+ tasks: list[Task] = []
 
 
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
+ def get_status(query: str) -> dict:
32
+ """Check status of an evaluation task by repo_id or repo_hash"""
33
 
34
+ query = query.strip().lower()
35
+ if not query:
36
+ return {"error": "Please enter a model id or task id"}
 
 
 
 
 
37
 
38
+ for task in reversed(tasks):
39
+ if task["repo_id"].lower() == query or task["repo_hash"].lower() == query:
40
+ return dict(task)
41
 
42
+ return {"error": f"No results found for '{query}'"}
 
 
43
 
44
 
45
+ def start_eval_task(display_name: str, repo_id: str, url: str) -> str:
46
+ """Start evaluation task in background. Returns task ID that can be used to check status."""
 
47
 
48
+ repo_hash, last_modified = get_repo_info(repo_id)
49
+ # TODO: check if hash is different from the most recent submission if any for repo_id, otherwise don't recompute
50
+ task = Task(
51
+ status="submitted",
52
+ display_name=display_name,
53
+ repo_id=repo_id,
54
+ repo_hash=repo_hash,
55
+ repo_last_modified=last_modified,
56
+ submission_timestamp=datetime.now(),
57
+ url=url,
58
+ error=None,
59
+ )
60
 
61
+ manager = multiprocessing.Manager()
62
+ task_proxy = manager.dict(task)
63
+ tasks.append(task_proxy) # type: ignore
64
+ multiprocessing.Process(
65
+ target=_eval_task, args=[task_proxy, leaderboard_lock]
66
+ ).start()
67
 
68
+ return repo_hash
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
 
 
 
70
 
71
+ test_ds = load_from_disk(os.path.join(os.path.dirname(__file__), "data", "test"))
72
 
 
 
 
 
73
 
74
+ def _eval_task(task: Task, leaderboard_lock):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  """Background task to evaluate model and save updated results"""
76
  try:
77
  # Indicate task is processing
78
+ task["status"] = "evaluating"
79
 
80
  # Evaluate model
81
+ average_per = 0
82
+ average_fer = 0
83
+ per_dataset_fers = {}
84
+
85
+ clear_cache()
86
+ model, processor = load_model(task["repo_id"])
87
+ for row in test_ds:
88
+ transcript = transcribe(row["audio"]["array"], model, processor) # type: ignore
89
+ row_per = per(transcript, row["ipa"]) # type: ignore
90
+ row_fer = fer(transcript, row["ipa"]) # type: ignore
91
+ average_per += row_per
92
+ average_fer += row_fer
93
+ per_dataset_fers[row["dataset"]] = per_dataset_fers.get(row["dataset"], 0) + row_fer # type: ignore
94
+ for key in per_dataset_fers.keys():
95
+ per_dataset_fers[key] /= len(test_ds.filter(lambda r: r["dataset"] == key))
96
+ average_per /= len(test_ds)
97
+ average_fer /= len(test_ds)
98
 
99
  # Save results
100
+ with leaderboard_lock:
101
+ add_leaderboard_entry(
102
+ display_name=task["display_name"],
103
+ repo_id=task["repo_id"],
104
+ repo_hash=task["repo_hash"],
105
+ repo_last_modified=task["repo_last_modified"],
106
+ submission_timestamp=task["submission_timestamp"],
107
+ average_per=average_per,
108
+ average_fer=average_fer,
109
+ url=task["url"],
110
+ per_dataset_fers=per_dataset_fers,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
+ # Mark task as complete
114
+ task["status"] = "completed"
115
  except Exception as e:
116
+ task["status"] = "failed"
117
+ task["error"] = str(e)
 
requirements.txt CHANGED
@@ -1,11 +1,17 @@
1
- # Core ML dependencies
2
- torch==2.0.1
3
- torchaudio==2.0.2
4
- transformers==4.44.2
5
- huggingface_hub==0.25.1
6
- gradio==5.12.0
7
- panphon==0.21.2
8
 
9
  # Data processing
10
  pandas==2.0.3
11
  numpy==1.25.2
 
 
 
 
 
 
 
 
 
 
 
1
+ # Huggingface
2
+ huggingface_hub==0.34.4
3
+ datasets==4.0.0
 
 
 
 
4
 
5
  # Data processing
6
  pandas==2.0.3
7
  numpy==1.25.2
8
+ panphon==0.21.2
9
+ torch==2.8.0
10
+ torchaudio==2.8.0
11
+ torchcodec==0.6.0
12
+ transformers==4.56.0
13
+ phonemizer==3.3.0
14
+
15
+ # UI
16
+ gradio==5.12.0
17
+ protobuf==6.32.0
requirements_lock.txt CHANGED
@@ -1,28 +1,100 @@
1
- certifi==2024.12.14
2
- cfgv==3.4.0
3
- charset-normalizer==3.4.1
4
- distlib==0.3.8
5
- filelock==3.15.4
6
- fsspec==2024.12.0
7
- huggingface-hub==0.27.1
8
- identify==2.5.36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  idna==3.10
10
- ml_dtypes==0.5.0
11
- nodeenv==1.9.1
12
- numpy==2.1.3
13
- onnx==1.17.0
14
- onnxscript==0.1.0.dev20241223
15
- packaging==24.2
16
- platformdirs==4.2.2
17
- pre-commit==3.7.1
18
- protobuf==5.29.2
19
- PyYAML==6.0.1
20
- regex==2024.11.6
21
- requests==2.32.3
22
- safetensors==0.5.2
23
- tokenizers==0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  tqdm==4.67.1
25
- transformers==4.48.0
26
- typing_extensions==4.12.2
27
- urllib3==2.3.0
28
- virtualenv==20.26.3
 
 
 
 
 
 
 
 
 
1
+ aiofiles==23.2.1
2
+ aiohappyeyeballs==2.6.1
3
+ aiohttp==3.12.15
4
+ aiosignal==1.4.0
5
+ annotated-types==0.7.0
6
+ anyio==4.10.0
7
+ async-timeout==5.0.1
8
+ attrs==25.3.0
9
+ babel==2.17.0
10
+ certifi==2025.8.3
11
+ charset-normalizer==3.4.3
12
+ click==8.2.1
13
+ colorama==0.4.6
14
+ csvw==3.5.1
15
+ datasets==4.0.0
16
+ dill==0.3.8
17
+ dlinfo==2.0.0
18
+ editdistance==0.8.1
19
+ exceptiongroup==1.3.0
20
+ fastapi==0.116.1
21
+ ffmpy==0.6.1
22
+ filelock==3.19.1
23
+ frozenlist==1.7.0
24
+ fsspec==2025.3.0
25
+ gradio==5.12.0
26
+ gradio_client==1.5.4
27
+ h11==0.16.0
28
+ hf-xet==1.1.9
29
+ httpcore==1.0.9
30
+ httpx==0.28.1
31
+ huggingface-hub==0.34.4
32
  idna==3.10
33
+ isodate==0.7.2
34
+ Jinja2==3.1.6
35
+ joblib==1.5.2
36
+ jsonschema==4.25.1
37
+ jsonschema-specifications==2025.4.1
38
+ language-tags==1.2.0
39
+ markdown-it-py==4.0.0
40
+ MarkupSafe==2.1.5
41
+ mdurl==0.1.2
42
+ mpmath==1.3.0
43
+ multidict==6.6.4
44
+ multiprocess==0.70.16
45
+ munkres==1.1.4
46
+ networkx==3.4.2
47
+ numpy==1.25.2
48
+ orjson==3.11.3
49
+ packaging==25.0
50
+ pandas==2.0.3
51
+ panphon==0.21.2
52
+ phonemizer==3.3.0
53
+ pillow==11.3.0
54
+ propcache==0.3.2
55
+ protobuf==6.32.0
56
+ pyarrow==21.0.0
57
+ pydantic==2.11.7
58
+ pydantic_core==2.33.2
59
+ pydub==0.25.1
60
+ Pygments==2.19.2
61
+ pyparsing==3.2.3
62
+ python-dateutil==2.9.0.post0
63
+ python-multipart==0.0.20
64
+ pytz==2025.2
65
+ PyYAML==6.0.2
66
+ rdflib==7.1.4
67
+ referencing==0.36.2
68
+ regex==2025.9.1
69
+ requests==2.32.5
70
+ rfc3986==1.5.0
71
+ rich==14.1.0
72
+ rpds-py==0.27.1
73
+ ruff==0.12.11
74
+ safehttpx==0.1.6
75
+ safetensors==0.6.2
76
+ segments==2.3.0
77
+ semantic-version==2.10.0
78
+ shellingham==1.5.4
79
+ six==1.17.0
80
+ sniffio==1.3.1
81
+ starlette==0.47.3
82
+ sympy==1.14.0
83
+ tokenizers==0.22.0
84
+ tomlkit==0.13.3
85
+ torch==2.8.0
86
+ torchaudio==2.8.0
87
+ torchcodec==0.6.0
88
  tqdm==4.67.1
89
+ transformers==4.56.0
90
+ typer==0.17.3
91
+ typing-inspection==0.4.1
92
+ typing_extensions==4.15.0
93
+ tzdata==2025.2
94
+ unicodecsv==0.14.1
95
+ uritemplate==4.2.0
96
+ urllib3==2.5.0
97
+ uvicorn==0.35.0
98
+ websockets==14.2
99
+ xxhash==3.5.0
100
+ yarl==1.20.1
scripts/download_data_curl.sh DELETED
@@ -1,3 +0,0 @@
1
- # install ./.data/TIMIT.zip from https://www.kaggle.com/datasets/mfekadu/darpa-timit-acousticphonetic-continuous-speech?resource=download
2
- curl -L -o ./queue/data/TIMIT.zip\
3
- https://www.kaggle.com/api/v1/datasets/download/mfekadu/darpa-timit-acousticphonetic-continuous-speech
 
 
 
 
scripts/download_data_lfs.sh DELETED
@@ -1,2 +0,0 @@
1
- # Download the TIMIT.zip dataset
2
- git lfs pull --include="./queue/data/TIMIT.zip"
 
 
 
scripts/install.sh DELETED
@@ -1,19 +0,0 @@
1
- # Create a virtual environment with Python 3.10
2
- python3.10 -m venv venv
3
-
4
- # Activate the virtual environment
5
- . ./venv/bin/activate
6
-
7
- # Install the required dependencies
8
- pip install -r requirements_lock.txt
9
-
10
- # Download data
11
- # check if git lfs is installed and run the appropriate script, otherwise run the curl script
12
- if [ -x "$(command -v git-lfs)" ]; then
13
- . ./scripts/download_data_lfs.sh
14
- else
15
- . ./scripts/download_data_curl.sh
16
- fi
17
-
18
- # Deactivate the virtual environment
19
- deactivate
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/run-dev.sh CHANGED
@@ -1,8 +1,2 @@
1
- # Activate the virtual environment
2
- . ./venv/bin/activate
3
-
4
  # Run the app with auto-reload enabled
5
  gradio app/app.py
6
-
7
- # Deactivate the virtual environment
8
- deactivate
 
 
 
 
1
  # Run the app with auto-reload enabled
2
  gradio app/app.py
 
 
 
scripts/run-prod.sh CHANGED
@@ -1,8 +1,2 @@
1
- # Activate the virtual environment
2
- . ./venv/bin/activate
3
-
4
  # Run the app without auto-reload
5
  python app/app.py
6
-
7
- # Deactivate the virtual environment
8
- deactivate
 
 
 
 
1
  # Run the app without auto-reload
2
  python app/app.py
 
 
 
scripts/sample_test_set.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ import os
4
+ from datasets import load_dataset, concatenate_datasets, Dataset
5
+
6
+ SEED = 42
7
+ SAMPLE_SIZE = 100
8
+
9
+ testsets: list[tuple[str, Dataset]] = [
10
+ ("TIMIT", load_dataset("KoelLabs/TIMIT")["test"]),
11
+ ("EpaDB", load_dataset("KoelLabs/EpaDB")["test"]),
12
+ ("PSST", load_dataset("KoelLabs/PSST")["test"]),
13
+ ("SpeechOcean", load_dataset("KoelLabs/SpeechOceanNoTH")["test"]),
14
+ ("ISLE", load_dataset("KoelLabs/ISLE")["train"]),
15
+ ] # type: ignore
16
+
17
+ all_datasets = []
18
+ for name, test_ds in testsets:
19
+ shuffled_ds = test_ds.shuffle(seed=SEED)
20
+ sample_ds = shuffled_ds.select(range(SAMPLE_SIZE))
21
+ sample_ds = sample_ds.add_column("dataset", [name] * len(sample_ds)) # type: ignore
22
+ sample_ds = sample_ds.remove_columns(
23
+ [
24
+ col
25
+ for col in sample_ds.column_names
26
+ if col not in ["audio", "ipa", "dataset"]
27
+ ]
28
+ )
29
+ all_datasets.append(sample_ds)
30
+ combined_ds: Dataset = concatenate_datasets(all_datasets)
31
+
32
+ os.makedirs(os.path.join("app", "data"), exist_ok=True)
33
+ combined_ds.save_to_disk(os.path.join("app", "data", "test"))