Dhivehi GPT-2

A GPT-2 language model trained on Dhivehi text data for text generation.

Model Details

  • Architecture: GPT-2
  • Vocab Size: 32,000 tokens
  • Context Length: 1024 tokens
  • Embedding Size: 768
  • Layers: 12
  • Attention Heads: 12
  • Total Parameters: ~124M

Training

The model is trained on Dhivehi text data with the following configuration:

  • Training Epochs: 3
  • Batch Size: 16 (4 per device with gradient accumulation of 4)
  • Learning Rate: 5e-4 with cosine decay
  • Weight Decay: 0.01
  • Warmup: 10% of training steps
  • Mixed Precision Training (FP16)
  • Early Stopping with patience of 3

Usage

def simple_generate(prompt, model_path):
    from grapp import DhivehiGPT2Generator
    generator = DhivehiGPT2Generator(model_path)
    return generator.generate_text(prompt, max_length=200)[0]

# Example usage:
result = simple_generate("ސުރުޚީ: ރާއްޖޭގެ", "alakxender/dv-articles-sm-gpt2")
print(result)
Downloads last month
5
Safetensors
Model size
13M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alakxender/dv-articles-sm-gpt2

Finetuned
(1890)
this model

Collection including alakxender/dv-articles-sm-gpt2