janisrebekahv/finetuned-colloquial-tamil

πŸ“Œ Model Overview

This is a fine-tuned version of suriya7/English-to-Tamil, trained to produce colloquial Tamil translations instead of formal Tamil.

βœ… Translates English β†’ Colloquial Tamil
βœ… Incorporates slang, informal speech, and real-world phrasing
βœ… Useful for chatbots, conversational AI, and social media applications


πŸ“œ Dataset

πŸ”Ή Custom Dataset Used for Fine-Tuning:
πŸ“‚ janisrebekahv/colloquial_tamil
This dataset was specifically curated to train this model, improving its ability to translate English to Colloquial Tamil accurately.
This model was fine-tuned on a custom dataset, which includes:

1️⃣ jarvisvasu/english-to-colloquial-tamil – A publicly available dataset for informal Tamil translations.
2️⃣ YouTube Comments Dataset (Custom-Created) – Extracted using the YouTube API and manually converted to colloquial Tamil for authenticity.
3️⃣ ChatGPT-Generated Data – Additional colloquial Tamil phrases aligned with natural speech patterns.

πŸ“ Total dataset size: 16,269 sentence pairs


πŸ”₯ Example Usage

Load and test the model using Hugging Face Transformers:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
model_name = "janisrebekahv/finetuned-colloquial-tamil"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Function to translate text
def translate(text):
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=128)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example translations
test_sentences = [
    "This is so beautiful",
    "Bro, are you coming or not?",
    "My mom is gonna kill me if I don't reach home now!"
]

for sentence in test_sentences:
    print(f"English: {sentence}")
    print(f"Colloquial Tamil: {translate(sentence)}\n")
Downloads last month
24
Safetensors
Model size
484M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train janisrebekahv/finetuned-colloquial-tamil

Evaluation results