T5-dhivehi-typo-corrector-asr

This model is a fine-tuned version of t5-small specifically designed to correct typographic and transcription errors in Dhivehi text, especially those arising from automatic speech recognition (ASR) systems. It is optimized for ASR output cleanup tasks and may not perform reliably on general-purpose text correction or with other model inputs outside the scope of Dhivehi ASR error correction. For best results, use this model only within the context of post-processing Dhivehi ASR outputs.

Usage

You can use this model to correct ASR-generated text from Dhivehi audio. Here's an example using Hugging Face Transformers:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("alakxender/t5-dhivehi-typo-corrector-asr")
model = AutoModelForSeq2SeqLM.from_pretrained("alakxender/t5-dhivehi-typo-corrector-asr")

input_text = "މަސްދޫކޮށް ފަހަރަކު"
input_ids = tokenizer("fix: " + input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Performance

  • Final Validation Loss: 0.3487
Downloads last month
62
Safetensors
Model size
60.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alakxender/t5-dhivehi-typo-corrector-asr

Base model

google-t5/t5-small
Finetuned
(2023)
this model

Dataset used to train alakxender/t5-dhivehi-typo-corrector-asr

Space using alakxender/t5-dhivehi-typo-corrector-asr 1