---
library_name: transformers
tags:
- translation
license: apache-2.0
language:
- en
- tr
metrics:
- bleu
base_model:
- Helsinki-NLP/opus-mt-tc-big-en-tr
pipeline_tag: translation
---

# Model Card for **yeniguno/marianmt-en-tr-kafkaesque**

A fine-tuned **MarianMT** model that translates **English prose into Turkish with a deliberate “Kafkaesque” flavour**.  
The checkpoint starts from the bilingual **Helsinki-NLP/opus-mt-en-tr** base model and is further trained on ~10 k parallel sentences taken from published Turkish & English versions of Franz Kafka’s works.  
The goal was purely experimental:  
> *Can a compact MT model be nudged toward a specific literary voice by exposing it to a small, style-consistent corpus?*

---

## Model Details
| | |
|---|---|
| **Base architecture** | MarianMT (Transformer encoder-decoder) |
| **Source languages**  | `en` (contemporary English) |
| **Target language**   | `tr` (modern Turkish) |
| **Training corpus**   | 10 014 sentence pairs manually aligned from Turkish editions of Kafka’s short stories and their authorised English translations |
| **Framework**         | 🤗 Transformers ≥ 4.40 |
| **License**           | Apache-2.0 for the *model code + weights* ✧ ⚠️ Translations used for fine-tuning may still be under copyright; see *“Data & Copyright”* below |

---

## Intended Uses & Scope
| **You can** | **You should not** |
|-------------|--------------------|
| Generate *draft* Turkish renderings of Kafka excerpts originally translated into English | Assume output is authoritative or publication-ready |
| Explore style-transfer / literary MT research | Rely on the model for technical, legal or medical translation |
| Use as a starting point for further stylistic fine-tuning | Expect high accuracy outside Kafka’s narrative domain |

---

## Training Procedure
*   **Hardware:** 1× A100 40 GB (Google Colab Pro)  
*   **Hyper-params:** 5 epochs, batch 16 (eff.), LR 5 × 10⁻⁵, linear decay, warm-up 200 steps  
*   **Early stopping:** patience 3 (@ 500-step evals) monitored on BLEU  
*   **Best checkpoint:** step 2 500  
    * Train loss ≈ 0.42 → Val loss ≈ 1.01 
    * SacreBLEU (500-sent dev) **baseline 24.4 → tuned 31.8**

---

## Quick Start

```python
from transformers import MarianMTModel, MarianTokenizer

tr_en_model_name = "yeniguno/opus-mt-en-tr-kafkaesque"
tokenizer = MarianTokenizer.from_pretrained(tr_en_model_name)
model = MarianMTModel.from_pretrained(tr_en_model_name)

turkish_text ="My neighbor, at the same peculiar hour each night, left his room with a small, locked bag in hand."

inputs = tokenizer(turkish_text, return_tensors="pt", padding=True)
output_ids = model.generate(**inputs)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
```