# Floret Language Identification & OCR Quality Scoring

This repository provides a **simple and lightweight** way to:
- **Identify the language** of a given text using a pre-trained Floret model.
- **Assess OCR quality scores** based on predefined Bloom filters.

---

## Installation
Before using the language identification or OCR scoring, install the required dependencies:

```sh
pip install floret huggingface_hub
pip install cython pybloomfiltermmap3 huggingface_hub
pip install fasttext
pip install floret  # Redundant but ensures installation
```

---

## Language Identification
To use the **Floret-based language detection model**, load it dynamically using `huggingface_hub`:

```python
from huggingface_hub import hf_hub_download
exec(open(hf_hub_download("Maslionok/sudo_pipelines", "floret_language_recognition.py")).read())
```

### **Usage**
Once loaded, call the model on a **plain text input** to detect its language:

```python
floret_model("this is a simple text")
```

**Output Example:**  
```python
'en'
```

---

## OCR Quality Score Calculation
To assess OCR text quality, load the OCR scoring model:

```python
from huggingface_hub import hf_hub_download
exec(open(hf_hub_download("Maslionok/sudo_pipelines", "OCR_score.py")).read())
```

### **Usage**
Call `OCR_score()` on your text:

- **Automatic language detection:**
  ```python
  OCR_score("some OCR-extracted text")
  ```
- **Specify a language manually:**
  ```python
  OCR_score("some OCR-extracted text", language="en")
  ```