--- library_name: transformers language: - fr - de - en - it - lb license: agpl-3.0 tags: - language-identification - multilingual - historical - impresso --- # Model Card for impresso-project/language-identifier ## Overview `impresso-project/language-identifier` is a multilingual language identification model fine-tuned for use on historical newspaper content. It supports **German (de), French (fr), Italian (it), English (en), and Luxembourgish (lb)** — the core languages of the [Impresso Project](https://impresso-project.ch), which focuses on analyzing historical media across national and linguistic borders. This model has been adapted for short, OCR-noisy and fragmentary inputs typical of historical digitized texts. ## Model Details - **Model type:** Language identification - **Interface:** Hugging Face `transformers` pipeline - **Languages supported:** fr, de, en, it, lb - **License:** AGPL-3.0 - **Developed by:** UZH, Switzerland - **Training data:** Historical newspapers from the impresso corpus and related sources ## How to Use ```python from transformers import pipeline MODEL_NAME = "impresso-project/language-identifier" lang_pipeline = pipeline( "langident", model=MODEL_NAME, trust_remote_code=True, device="cpu", ) text = """En l'an 1348, au plus fort des ravages de la peste noire à travers l'Europe, le Royaume de France se trouvait à la fois au bord du désespoir et face à une opportunité.""" langs = lang_pipeline(text) print(langs) ``` ## Output Format The output is a single dictionary with the predicted language and confidence score: ```python { "language": "fr", "score": 1.0 } ``` ## Use Cases - Preprocessing for OCR and NLP tasks on historical corpora - Document and segment-level language tagging - Filtering and sorting multilingual newspaper archives ## Limitations - Works best on **sentence- or paragraph-length** texts - May struggle with code-switching or OCR-degraded text that mixes languages - Primarily optimized for **Impresso-like sources** (19th–20th century newspapers) ## Installation ```bash pip install transformers floret ``` ## Contact - Website: [https://impresso-project.ch](https://impresso-project.ch)

Impresso Logo