File size: 3,351 Bytes
31ebc8b
 
164730f
 
31ebc8b
 
 
164730f
31ebc8b
164730f
31ebc8b
164730f
31ebc8b
 
 
164730f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31ebc8b
164730f
31ebc8b
164730f
 
 
 
 
 
 
 
 
 
31ebc8b
 
 
164730f
 
 
 
 
 
 
 
31ebc8b
 
 
164730f
 
 
 
 
 
 
31ebc8b
 
 
164730f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: mit
title: Product Categorization Demo
sdk: streamlit
emoji: πŸš€
colorFrom: purple
colorTo: yellow
# sdk_version: (Streamlit doesn't typically use a fixed version here)
---
# Product Categorization App - Streamlit Demo

This is a Streamlit application for categorizing products based on their similarity to ingredients or predefined categories using AI embeddings (e.g., Voyage AI) and optional reranking (Voyage AI, OpenAI).

## Quick Start

1.  **Clone the repository:**
    ```bash
    git clone <repository_url>
    cd <repository_directory>
    ```
2.  **Create a virtual environment (optional but recommended):**
    ```bash
    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
    ```
3.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```
4.  **Prepare Embeddings:** Ensure your embedding files (`ingredient_embeddings_voyageai.pkl`, `category_embeddings.pickle`, etc.) are present in the `data/` directory.
5.  **Configure API Keys:**
    *   Copy the `.env.example` file (if it exists) or create a new file named `.env`.
    *   Add your API keys to the `.env` file:
        ```dotenv
        VOYAGE_API_KEY="YOUR_VOYAGE_API_KEY_HERE"
        OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
        # Add other keys like CHICORY if needed
        ```
6.  **Run the application:**
    ```bash
    streamlit run app.py
    ```
    Alternatively, if you have configured the `./run_app.sh` script:
    ```bash
    ./run_app.sh
    ```
7.  The application will open in your default web browser.

## Features

-   **Multiple Matching Methods:**
    -   Ingredient Embeddings
    -   Category Embeddings
    -   Voyage AI Reranking (Ingredients/Categories)
    -   OpenAI Reranking (Ingredients/Categories)
    -   Comparison View across methods
-   **Text Input:** Enter product names one per line.
-   **Description Expansion:** Optionally use OpenAI to expand product descriptions before matching.
-   **Adjustable Parameters:** Control Top-N results, confidence thresholds, etc. for different methods.
-   **Example Loading:** Quickly load sample product names.

## Hosting on Hugging Face Spaces

1.  Create a free account on [Hugging Face](https://huggingface.co/).
2.  Go to [Hugging Face Spaces](https://huggingface.co/spaces).
3.  Click "Create a new Space".
4.  Select "Streamlit" as the SDK.
5.  Choose a repository type (usually Git).
6.  Upload all project files (including the `data` directory with embeddings) to the space repository.
7.  **Important:** Add your API keys (`VOYAGE_API_KEY`, `OPENAI_API_KEY`, etc.) as **Secrets** in your Hugging Face Space settings. Do *not* commit the `.env` file directly.
8.  Your app should build and deploy automatically.

## Files Included

-   `app.py`: The main Streamlit application entry point.
-   `ui.py`: Defines the Streamlit UI layout and components.
-   `*.py` (various): Backend logic for embeddings, matching, API calls, formatting.
-   `requirements.txt`: Required Python packages.
-   `.env`: File to store API keys (add your keys here, **do not commit**).
-   `run_app.sh`: Example script to run the app locally.
-   `data/`: Directory containing embedding files.

## Requirements

-   Python 3.8+
-   API keys for Voyage AI and/or OpenAI (stored in `.env`).
-   Internet connection for API calls.