Spaces:

IqraEval
/

Leaderboard

Running

Leaderboard / evaluation.py

Update evaluation.py

14b898e verified 2 days ago

1.64 kB

	import gradio as gr

	def render_eval_info():
	text = r"""

	The Iqra’Eval challenge provides a shared, transparent platform to benchmark phoneme‐prediction systems on our open testset (“IqraEval/open_testset”).

	Submission Details
	– Submit a UTF‑8 CSV named teamName_submission.csv with exactly two columns:
	1. ID: utterance identifier (e.g. “0000_0001”)
	2. Labels: your predicted phoneme sequence (space‑separated)

	```csv
	ID,Labels
	0000_0001,i n n a m a a y a …
	0000_0002,m a a n a n s a …
	```

	Evaluation Criteria
	– Leaderboard ranking is based on phoneme‑level F1‑score, computed via a two‑stage (detection + diagnostic) hierarchy:

	1. Detection (error vs. correct)
	- TR (True Rejects): mispronounced phonemes correctly flagged
	- FA (False Accepts): mispronunciations missed
	- FR (False Rejects): correct phonemes wrongly flagged
	- TA (True Accepts): correct phonemes correctly passed

	Metrics:

	- Precision = `TR / (TR + FR)`
	- Recall = `TR / (TR + FA)`
	- F1 = `2 · Precision · Recall / (Precision + Recall)`

	2. Diagnostic (substitution/deletion/insertion errors)
	See the Metrics tab for breakdown into:
	- DER: Deletion Error Rate
	- IER: Insertion Error Rate
	- SER: Substitution Error Rate

	– Once we receive your file (email: [email protected]), your submission is auto‑evaluated and placed on the leaderboard.



	"""
	return gr.Markdown(text, latex_delimiters=[{ "left": "$", "right": "$", "display": True }])

	import gradio as gr

	def render_eval_info():
	text = r"""

	The Iqra’Eval challenge provides a shared, transparent platform to benchmark phoneme‐prediction systems on our open testset (“IqraEval/open_testset”).

	Submission Details
	– Submit a UTF‑8 CSV named teamName_submission.csv with exactly two columns:
	1. ID: utterance identifier (e.g. “0000_0001”)
	2. Labels: your predicted phoneme sequence (space‑separated)

	```csv
	ID,Labels
	0000_0001,i n n a m a a y a …
	0000_0002,m a a n a n s a …
	```

	Evaluation Criteria
	– Leaderboard ranking is based on phoneme‑level F1‑score, computed via a two‑stage (detection + diagnostic) hierarchy:

	1. Detection (error vs. correct)
	- TR (True Rejects): mispronounced phonemes correctly flagged
	- FA (False Accepts): mispronunciations missed
	- FR (False Rejects): correct phonemes wrongly flagged
	- TA (True Accepts): correct phonemes correctly passed

	Metrics:

	- Precision = `TR / (TR + FR)`
	- Recall = `TR / (TR + FA)`
	- F1 = `2 · Precision · Recall / (Precision + Recall)`

	2. Diagnostic (substitution/deletion/insertion errors)
	See the Metrics tab for breakdown into:
	- DER: Deletion Error Rate
	- IER: Insertion Error Rate
	- SER: Substitution Error Rate

	– Once we receive your file (email: [email protected]), your submission is auto‑evaluated and placed on the leaderboard.



	"""
	return gr.Markdown(text, latex_delimiters=[{ "left": "$", "right": "$", "display": True }])