# Floret Language Identification & OCR Quality Scoring This repository provides a **simple and lightweight** way to: - **Identify the language** of a given text using a pre-trained Floret model. - **Assess OCR quality scores** based on predefined Bloom filters. --- ## Installation Before using the language identification or OCR scoring, install the required dependencies: ```sh pip install floret huggingface_hub pip install cython pybloomfiltermmap3 huggingface_hub pip install fasttext pip install floret # Redundant but ensures installation ``` --- ## Language Identification To use the **Floret-based language detection model**, load it dynamically using `huggingface_hub`: ```python from huggingface_hub import hf_hub_download exec(open(hf_hub_download("Maslionok/sudo_pipelines", "floret_language_recognition.py")).read()) ``` ### **Usage** Once loaded, call the model on a **plain text input** to detect its language: ```python floret_model("this is a simple text") ``` **Output Example:** ```python 'en' ``` --- ## OCR Quality Score Calculation To assess OCR text quality, load the OCR scoring model: ```python from huggingface_hub import hf_hub_download exec(open(hf_hub_download("Maslionok/sudo_pipelines", "OCR_score.py")).read()) ``` ### **Usage** Call `OCR_score()` on your text: - **Automatic language detection:** ```python OCR_score("some OCR-extracted text") ``` - **Specify a language manually:** ```python OCR_score("some OCR-extracted text", language="en") ```