Masoretic Hebrew to Yiddish (Yehoyesh) MarianMT Model
This model fine-tunes the Helsinki-NLP/opus-mt-mul-en MarianMT model for translation from the Masoretic Hebrew consonantal text of the Tanakh (Hebrew Bible) to the Yiddish translation by Yehoyesh (Yehoash Solomon Blumgarten, 1870-1927). The model is trained on a parallel corpus of the entire Tanakh, with Hebrew source and Yiddish target, both in Hebrew script.
Model Details
- Model Name:
johnlockejrr/marianmt-he2yid-tanakh
- Base Model: Helsinki-NLP/opus-mt-mul-en
- Language Pair: Masoretic Hebrew → Yiddish (he2yid)
- Script: Both languages in Hebrew characters (consonantal for Hebrew, standard for Yiddish)
- Domain: Biblical texts (Tanakh)
- License: MIT
Dataset
- Hebrew Source: Masoretic Hebrew consonantal text of the Tanakh (Torah, Neviʼim, Khetuvim)
- Yiddish Target: Yiddish translation of the Tanakh by Yehoyesh Shloyme (Yehoash Solomon) Blumgarten (1870-1927), as published in "Torah, Neviʼim, u-Khetuvim" (New York: Yehoʼash Farlag Gezelshaft, 1941)
- Alignment: Verse-aligned, covering the entire Tanakh
Training Configuration
- Base Model: Helsinki-NLP/opus-mt-mul-en
- Batch Size: 4 (per device, gradient accumulation for effective batch size)
- Learning Rate: 1e-5
- Epochs: 100
- FP16: Enabled
- Language Prefix: Uses
>>heb<<
for Hebrew and>>yi<<
for Yiddish - Tokenizer: MarianMT tokenizer with added special tokens for language direction
Usage
Inference Example
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "johnlockejrr/marianmt-he2yid-tanakh"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Translate Hebrew to Yiddish
text = "בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ"
inputs = tokenizer(f">>heb<< {text}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Yiddish: {translation}")
Intended Use
- Primary: Automatic translation of Masoretic Hebrew Tanakh verses to Yiddish (Yehoyesh translation style)
- Research: Useful for digital humanities, comparative linguistics, and Jewish studies
- Education: Can assist in language learning and textual analysis
Limitations
- Context: The model is trained at the verse level and does not have document-level context
- Domain: Optimized for biblical text; may not generalize to modern Hebrew or Yiddish
- Orthography: Hebrew is consonantal; Yiddish is in standard Yiddish orthography (Hebrew script)
Citation
If you use this model, please cite:
@misc{marianmt-he2yid-tanakh,
author = {John Locke Jr.},
title = {Masoretic Hebrew to Yiddish (Yehoyesh) MarianMT Model},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face model repository},
howpublished = {\url{https://huggingface.co/johnlockejrr/marianmt-he2yid-tanakh}},
}
Acknowledgements
- Yehoyesh Tanakh: Yiddish translation by Yehoyesh Shloyme (Yehoash Solomon) Blumgarten (1870-1927)
- Masoretic Text: Public domain sources
- Helsinki-NLP: For the base MarianMT model
License
MIT
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for johnlockejrr/marianmt-he2yid-tanakh
Base model
Helsinki-NLP/opus-mt-mul-en