You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

๐Ÿฆ‰ Georgian KenLM Language Model (3-gram)

This is a KenLM 3-gram language model trained on Georgian (แƒฅแƒแƒ แƒ—แƒฃแƒšแƒ˜) text data, intended for use in automatic speech recognition (ASR) and other language modeling tasks.


๐Ÿงพ Model Details

  • Language: Georgian (ka)
  • Model Type: KenLM n-gram
  • n-gram size: 3-gram
  • Format: .arpa
  • Tooling: KenLM

๐Ÿ“‚ Files

  • ge_model9.arpa โ€“ ARPA plaintext format

๐Ÿ“š Training Data

The model was trained on a curated collection of Georgian text from various domains:

  • News articles
  • Subtitles
  • Books and web content

Data was cleaned, tokenized with whitespace, and normalized to standard Georgian orthography.


๐Ÿ’ฌ Intended Use

This model is ideal for:

  • Beam search decoding in ASR systems (e.g., Whisper, DeepSpeech, Vosk)
  • Scoring and reranking ASR hypotheses
  • Basic text modeling or spelling correction in Georgian

๐Ÿงช Example Usage

import kenlm

def transliterate_georgian(text):
    georgian_to_latin = {
    'แƒ': 'a', 'แƒ‘': 'b', 'แƒ’': 'g', 'แƒ“': 'd', 'แƒ”': 'e', 'แƒ•': 'v', 'แƒ–': 'z', 'แƒ—': 'T', 'แƒ˜': 'i',
    'แƒ™': 'k', 'แƒš': 'l', 'แƒ›': 'm', 'แƒœ': 'n', 'แƒ': 'o', 'แƒž': 'p', 'แƒŸ': 'J', 'แƒ ': 'r', 'แƒก': 's',
    'แƒข': 't', 'แƒฃ': 'u', 'แƒค': 'f', 'แƒฅ': 'q', 'แƒฆ': 'R', 'แƒง': 'y', 'แƒจ': 'S', 'แƒฉ': 'C', 'แƒช': 'c',
    'แƒซ': 'Z', 'แƒฌ': 'w', 'แƒญ': 'W', 'แƒฎ': 'x', 'แƒฏ': 'j', 'แƒฐ': 'h'}
    
    return ''.join(georgian_to_latin.get(char, char) for char in text)

model = kenlm.Model("ge_model9.arpa")
sentence = "แƒ”แƒก แƒแƒ แƒ˜แƒก แƒขแƒ”แƒกแƒขแƒ˜"
print(model.score(transliterate_georgian(sentence), bos=True, eos=True))

Citation

@misc{georgian-kenlm,
  title={Georgian KenLM Language Model},
  author={Giorgi G},
  year={2025},
  howpublished={\url{https://huggingface.co/psyfreak/GEO-KenLM}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support