Greek and Latin Author Classifier

This model distinguishes the names of authors who primarily wrote in Ancient Greek from the names of authors who wrote primarily in Latin.

The specific purpose for this model is to assist in processing bibliographic metadata about editions of Latin texts.

Most critical editions of ancient Greek texts bear a Latin version of the title of the original work and the author's name. For example, Hesiod's Theogony is Hesiodi Theogonia instead of Ἡσιόδου Θεογονία. Consequently, these works are often cataloged under the subject "Latin" by catalogers doing their best with languages they do not understand.

Consequently, metadata records tagged with the subject "Latin" from collections such as the HathiTrust Digital Library inevitably include Greek works.

Since the Digital Latin Library is interested only in records of Latin works, we need a good way of winnowing out the Greek editions. This model does a good job of that.

Emissions

Here is the codecarbon output from training on Google Colab with an A100 runtime:

timestamp: 2025-06-12T17:07:28
project_name: codecarbon
run_id: e9ec6b22-3102-4fb8-80bb-cf1a15740b8b
duration: 1348.9958429336548
emissions: 0.0091357690759099
emissions_rate: 6.772273705486325e-06
cpu_power: 42.5
gpu_power: 0.0
ram_power: 9.000000000000002
cpu_energy: 0.0159251248127884
gpu_energy: 0
ram_energy: 0.0033721966546773
energy_consumed: 0.0192973214674658
country_name: United States
country_iso_code: USA
os: macOS-15.5-arm64-arm-64bit
python_version: 3.10.9
codecarbon_version: 2.2.2
cpu_count: 12
cpu_model: Apple M4 Pro
ram_total_size: 24.0
tracking_mode: machine

sjhuskey
/

distilbert_multilingual_cased_greek_latin_classifier

Greek and Latin Author Classifier

Emissions

Model tree for sjhuskey/distilbert_multilingual_cased_greek_latin_classifier

Dataset used to train sjhuskey/distilbert_multilingual_cased_greek_latin_classifier