T5-dhivehi-typo-corrector-asr

This model is a fine-tuned version of t5-small specifically designed to correct typographic and transcription errors in Dhivehi text, especially those arising from automatic speech recognition (ASR) systems. It is optimized for ASR output cleanup tasks and may not perform reliably on general-purpose text correction or with other model inputs outside the scope of Dhivehi ASR error correction. For best results, use this model only within the context of post-processing Dhivehi ASR outputs.

Usage

You can use this model to correct ASR-generated text from Dhivehi audio. Here's an example using Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("alakxender/t5-dhivehi-typo-corrector-asr")
model = AutoModelForSeq2SeqLM.from_pretrained("alakxender/t5-dhivehi-typo-corrector-asr")

input_text = "މަސްދޫކޮށް ފަހަރަކު"
input_ids = tokenizer("fix: " + input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Performance

Final Validation Loss: 0.3487

alakxender
/

t5-dhivehi-typo-corrector-asr

T5-dhivehi-typo-corrector-asr

Usage

Performance

Model tree for alakxender/t5-dhivehi-typo-corrector-asr

Dataset used to train alakxender/t5-dhivehi-typo-corrector-asr

Space using alakxender/t5-dhivehi-typo-corrector-asr 1

Evaluation results