Spaces:
Sleeping
Sleeping
Use streamlit
Browse files- README.md +66 -31
- app.py +43 -22
- comparison.py +40 -37
- embeddings.py +3 -2
- requirements.txt +3 -1
- ui.py +244 -208
- ui_category_matching.py +13 -12
- ui_expanded_matching.py +27 -32
- ui_hybrid_matching.py +39 -36
- ui_ingredient_matching.py +3 -3
README.md
CHANGED
@@ -1,52 +1,87 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
-
title: Demo
|
4 |
-
sdk:
|
5 |
emoji: 🚀
|
6 |
colorFrom: purple
|
7 |
colorTo: yellow
|
8 |
-
sdk_version:
|
9 |
---
|
10 |
-
# Product Categorization App -
|
11 |
|
12 |
-
This is a
|
13 |
|
14 |
## Quick Start
|
15 |
|
16 |
-
1.
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
-
|
20 |
-
bash run_app.sh
|
21 |
-
```
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
-
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
|
|
|
|
31 |
|
32 |
## Hosting on Hugging Face Spaces
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
|
43 |
## Files Included
|
44 |
|
45 |
-
-
|
46 |
-
-
|
47 |
-
-
|
|
|
|
|
|
|
|
|
48 |
|
49 |
## Requirements
|
50 |
|
51 |
-
-
|
52 |
-
-
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
title: Product Categorization Demo
|
4 |
+
sdk: streamlit
|
5 |
emoji: 🚀
|
6 |
colorFrom: purple
|
7 |
colorTo: yellow
|
8 |
+
# sdk_version: (Streamlit doesn't typically use a fixed version here)
|
9 |
---
|
10 |
+
# Product Categorization App - Streamlit Demo
|
11 |
|
12 |
+
This is a Streamlit application for categorizing products based on their similarity to ingredients or predefined categories using AI embeddings (e.g., Voyage AI) and optional reranking (Voyage AI, OpenAI).
|
13 |
|
14 |
## Quick Start
|
15 |
|
16 |
+
1. **Clone the repository:**
|
17 |
+
```bash
|
18 |
+
git clone <repository_url>
|
19 |
+
cd <repository_directory>
|
20 |
+
```
|
21 |
+
2. **Create a virtual environment (optional but recommended):**
|
22 |
+
```bash
|
23 |
+
python -m venv venv
|
24 |
+
source venv/bin/activate # On Windows use `venv\Scripts\activate`
|
25 |
+
```
|
26 |
+
3. **Install dependencies:**
|
27 |
+
```bash
|
28 |
+
pip install -r requirements.txt
|
29 |
+
```
|
30 |
+
4. **Prepare Embeddings:** Ensure your embedding files (`ingredient_embeddings_voyageai.pkl`, `category_embeddings.pickle`, etc.) are present in the `data/` directory.
|
31 |
+
5. **Configure API Keys:**
|
32 |
+
* Copy the `.env.example` file (if it exists) or create a new file named `.env`.
|
33 |
+
* Add your API keys to the `.env` file:
|
34 |
+
```dotenv
|
35 |
+
VOYAGE_API_KEY="YOUR_VOYAGE_API_KEY_HERE"
|
36 |
+
OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
|
37 |
+
# Add other keys like CHICORY if needed
|
38 |
+
```
|
39 |
+
6. **Run the application:**
|
40 |
+
```bash
|
41 |
+
streamlit run app.py
|
42 |
+
```
|
43 |
+
Alternatively, if you have configured the `./run_app.sh` script:
|
44 |
+
```bash
|
45 |
+
./run_app.sh
|
46 |
+
```
|
47 |
+
7. The application will open in your default web browser.
|
48 |
|
49 |
+
## Features
|
|
|
|
|
50 |
|
51 |
+
- **Multiple Matching Methods:**
|
52 |
+
- Ingredient Embeddings
|
53 |
+
- Category Embeddings
|
54 |
+
- Voyage AI Reranking (Ingredients/Categories)
|
55 |
+
- OpenAI Reranking (Ingredients/Categories)
|
56 |
+
- Comparison View across methods
|
57 |
+
- **Text Input:** Enter product names one per line.
|
58 |
+
- **Description Expansion:** Optionally use OpenAI to expand product descriptions before matching.
|
59 |
+
- **Adjustable Parameters:** Control Top-N results, confidence thresholds, etc. for different methods.
|
60 |
+
- **Example Loading:** Quickly load sample product names.
|
61 |
|
62 |
## Hosting on Hugging Face Spaces
|
63 |
|
64 |
+
1. Create a free account on [Hugging Face](https://huggingface.co/).
|
65 |
+
2. Go to [Hugging Face Spaces](https://huggingface.co/spaces).
|
66 |
+
3. Click "Create a new Space".
|
67 |
+
4. Select "Streamlit" as the SDK.
|
68 |
+
5. Choose a repository type (usually Git).
|
69 |
+
6. Upload all project files (including the `data` directory with embeddings) to the space repository.
|
70 |
+
7. **Important:** Add your API keys (`VOYAGE_API_KEY`, `OPENAI_API_KEY`, etc.) as **Secrets** in your Hugging Face Space settings. Do *not* commit the `.env` file directly.
|
71 |
+
8. Your app should build and deploy automatically.
|
72 |
|
73 |
## Files Included
|
74 |
|
75 |
+
- `app.py`: The main Streamlit application entry point.
|
76 |
+
- `ui.py`: Defines the Streamlit UI layout and components.
|
77 |
+
- `*.py` (various): Backend logic for embeddings, matching, API calls, formatting.
|
78 |
+
- `requirements.txt`: Required Python packages.
|
79 |
+
- `.env`: File to store API keys (add your keys here, **do not commit**).
|
80 |
+
- `run_app.sh`: Example script to run the app locally.
|
81 |
+
- `data/`: Directory containing embedding files.
|
82 |
|
83 |
## Requirements
|
84 |
|
85 |
+
- Python 3.8+
|
86 |
+
- API keys for Voyage AI and/or OpenAI (stored in `.env`).
|
87 |
+
- Internet connection for API calls.
|
app.py
CHANGED
@@ -1,29 +1,50 @@
|
|
1 |
import os
|
2 |
import sys
|
3 |
-
import
|
|
|
|
|
|
|
|
|
4 |
from utils import load_embeddings
|
5 |
-
from ui import
|
|
|
|
|
|
|
|
|
6 |
|
7 |
# Path to the embeddings file
|
8 |
EMBEDDINGS_PATH = "data/ingredient_embeddings_voyageai.pkl"
|
9 |
|
10 |
-
#
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
import os
|
2 |
import sys
|
3 |
+
from dotenv import load_dotenv # Import load_dotenv
|
4 |
+
import streamlit as st
|
5 |
+
|
6 |
+
# Load environment variables from .env file at the very beginning
|
7 |
+
load_dotenv()
|
8 |
from utils import load_embeddings
|
9 |
+
from ui import render_ui # Import the new Streamlit UI function
|
10 |
+
|
11 |
+
# Set page config as the first Streamlit command
|
12 |
+
st.set_page_config(layout="wide", page_title="Product Categorization Tool")
|
13 |
+
import ui_core # Import ui_core to set embeddings
|
14 |
|
15 |
# Path to the embeddings file
|
16 |
EMBEDDINGS_PATH = "data/ingredient_embeddings_voyageai.pkl"
|
17 |
|
18 |
+
# Use Streamlit's caching to load embeddings only once
|
19 |
+
@st.cache_data
|
20 |
+
def load_all_embeddings(path):
|
21 |
+
"""Loads embeddings from the specified path."""
|
22 |
+
if not os.path.exists(path):
|
23 |
+
st.error(f"Error: Embeddings file {path} not found!")
|
24 |
+
st.error(f"Please ensure the file exists at {os.path.abspath(path)}")
|
25 |
+
st.stop() # Stop execution if file not found
|
26 |
+
return None # Return None explicitly, although st.stop() halts
|
27 |
+
|
28 |
+
try:
|
29 |
+
embeddings_data = load_embeddings(path)
|
30 |
+
return embeddings_data
|
31 |
+
except Exception as e:
|
32 |
+
st.error(f"Error loading embeddings: {e}")
|
33 |
+
st.stop()
|
34 |
+
return None
|
35 |
+
|
36 |
+
# Load embeddings and make them available to UI modules
|
37 |
+
embeddings_data = load_all_embeddings(EMBEDDINGS_PATH)
|
38 |
+
|
39 |
+
if embeddings_data:
|
40 |
+
# Pass the loaded embeddings to the ui_core module where other UI modules import it from
|
41 |
+
ui_core.embeddings = embeddings_data
|
42 |
+
|
43 |
+
# Render the Streamlit UI
|
44 |
+
render_ui()
|
45 |
+
else:
|
46 |
+
# This part should ideally not be reached due to st.stop() in load_all_embeddings
|
47 |
+
st.error("Failed to load embeddings. Application cannot start.")
|
48 |
+
|
49 |
+
# Note: No __main__ block needed for Streamlit.
|
50 |
+
# Streamlit apps are run using `streamlit run app.py`
|
comparison.py
CHANGED
@@ -8,7 +8,7 @@ from similarity import hybrid_ingredient_matching
|
|
8 |
from api_utils import process_in_parallel, rank_ingredients_openai
|
9 |
from ui_formatters import format_comparison_html, create_results_container
|
10 |
|
11 |
-
from utils import SafeProgress
|
12 |
from chicory_api import call_chicory_parser
|
13 |
from embeddings import create_product_embeddings
|
14 |
from similarity import compute_similarities
|
@@ -16,7 +16,7 @@ from similarity import compute_similarities
|
|
16 |
def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str, Any],
|
17 |
embedding_top_n: int = 20, final_top_n: int = 3,
|
18 |
confidence_threshold: float = 0.5, match_type="ingredients",
|
19 |
-
|
20 |
"""
|
21 |
Compare multiple ingredient/category matching methods on the same products
|
22 |
|
@@ -43,20 +43,21 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
43 |
else:
|
44 |
print(f"WARNING: First product '{products[0] if products else 'None'}' not found in expanded descriptions")
|
45 |
|
46 |
-
|
|
|
47 |
|
48 |
# Step 1: Generate embeddings for all products (used by multiple methods)
|
49 |
-
progress_tracker(0.1, desc="Generating product embeddings")
|
50 |
# Use expanded descriptions for embeddings if available
|
51 |
if expanded_descriptions:
|
52 |
expanded_product_texts = [expanded_descriptions.get(p, p) for p in products]
|
53 |
-
product_embeddings = create_product_embeddings(expanded_product_texts,
|
54 |
-
original_products=products) # Keep original product IDs
|
55 |
else:
|
56 |
-
product_embeddings = create_product_embeddings(products
|
57 |
|
58 |
# Step 2: Get embedding-based candidates for all products
|
59 |
-
progress_tracker(0.2, desc="Finding embedding candidates")
|
60 |
similarities = compute_similarities(ingredients_dict, product_embeddings)
|
61 |
|
62 |
# Filter to top N candidates per product
|
@@ -65,11 +66,11 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
65 |
embedding_results[product] = product_similarities[:embedding_top_n]
|
66 |
|
67 |
# Step 3: Process with Chicory Parser
|
68 |
-
progress_tracker(0.3, desc="Running Chicory Parser")
|
69 |
# Import here to avoid circular imports
|
70 |
# from chicory_parser import parse_products
|
71 |
|
72 |
-
chicory_results = call_chicory_parser(products
|
73 |
|
74 |
# Initialize result structure
|
75 |
comparison_results = {}
|
@@ -103,7 +104,7 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
103 |
comparison_results[product]["chicory"] = chicory_matches
|
104 |
|
105 |
# Step 4: Process with Voyage AI
|
106 |
-
progress_tracker(0.4, desc="Processing with Voyage AI")
|
107 |
|
108 |
# Define processing function for Voyage
|
109 |
def process_voyage(product):
|
@@ -156,13 +157,17 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
156 |
|
157 |
# Ensure results are in the expected format
|
158 |
formatted_results = []
|
|
|
159 |
for r in results[:final_top_n]:
|
160 |
if isinstance(r, dict) and "name" in r and "score" in r:
|
161 |
# Convert score to float to ensure type compatibility
|
162 |
try:
|
163 |
score = float(r["score"])
|
|
|
164 |
if score >= confidence_threshold:
|
165 |
-
|
|
|
|
|
166 |
except (ValueError, TypeError):
|
167 |
print(f"Invalid score format in result: {r}")
|
168 |
elif isinstance(r, tuple) and len(r) >= 2:
|
@@ -177,7 +182,9 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
177 |
name = r[0]
|
178 |
|
179 |
if score >= confidence_threshold:
|
180 |
-
|
|
|
|
|
181 |
except (ValueError, TypeError):
|
182 |
print(f"Invalid score format in tuple: {r}")
|
183 |
|
@@ -197,11 +204,8 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
197 |
voyage_results = process_in_parallel(
|
198 |
items=products,
|
199 |
processor_func=process_voyage,
|
200 |
-
max_workers=min(20, len(products))
|
201 |
-
|
202 |
-
progress_start=0.4,
|
203 |
-
progress_end=0.65,
|
204 |
-
progress_desc="Voyage AI"
|
205 |
)
|
206 |
|
207 |
# Update comparison results with Voyage results
|
@@ -210,7 +214,7 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
210 |
comparison_results[product]["voyage"] = results
|
211 |
|
212 |
# Step 5: Process with OpenAI
|
213 |
-
progress_tracker(0.7, desc="Running OpenAI processing in parallel")
|
214 |
|
215 |
# Define processing function for OpenAI
|
216 |
def process_openai(product):
|
@@ -261,11 +265,8 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
261 |
openai_results = process_in_parallel(
|
262 |
items=products,
|
263 |
processor_func=process_openai,
|
264 |
-
max_workers=min(20, len(products))
|
265 |
-
|
266 |
-
progress_start=0.7,
|
267 |
-
progress_end=0.95,
|
268 |
-
progress_desc="OpenAI"
|
269 |
)
|
270 |
|
271 |
# Update comparison results with OpenAI results
|
@@ -303,12 +304,12 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
|
|
303 |
|
304 |
method_results[method] = formatted_results
|
305 |
|
306 |
-
progress_tracker(1.0, desc="Comparison complete")
|
307 |
return comparison_results
|
308 |
|
309 |
def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
|
310 |
final_top_n=3, confidence_threshold=0.5,
|
311 |
-
match_type="categories", use_expansion=False
|
312 |
"""
|
313 |
Compare multiple ingredient matching methods on the same products
|
314 |
|
@@ -324,10 +325,12 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
|
|
324 |
Returns:
|
325 |
HTML formatted comparison results
|
326 |
"""
|
327 |
-
from utils import SafeProgress
|
|
|
328 |
|
329 |
-
|
330 |
-
progress_tracker(
|
|
|
331 |
|
332 |
# Split text input by lines and remove empty lines
|
333 |
if not product_input:
|
@@ -338,7 +341,7 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
|
|
338 |
|
339 |
# Load appropriate embeddings based on match type
|
340 |
try:
|
341 |
-
progress_tracker(0.2, desc="Loading embeddings")
|
342 |
if match_type == "ingredients":
|
343 |
embeddings_path = "data/ingredient_embeddings_voyageai.pkl"
|
344 |
embeddings_dict = load_embeddings(embeddings_path)
|
@@ -355,20 +358,20 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
|
|
355 |
# Expand descriptions if requested
|
356 |
if use_expansion:
|
357 |
from openai_expansion import expand_product_descriptions
|
358 |
-
progress_tracker(0.25, desc="Expanding product descriptions")
|
359 |
-
expanded_products = expand_product_descriptions(product_names
|
360 |
# Add at beginning of results
|
361 |
header_text = f"Comparing {len(product_names)} products using multiple {match_type} matching methods WITH expanded descriptions."
|
362 |
|
363 |
-
progress_tracker(0.3, desc="Comparing methods")
|
364 |
comparison_results = compare_ingredient_methods(
|
365 |
products=product_names,
|
366 |
ingredients_dict=embeddings_dict,
|
367 |
embedding_top_n=embedding_top_n,
|
368 |
final_top_n=final_top_n,
|
369 |
confidence_threshold=confidence_threshold,
|
370 |
-
match_type=match_type,
|
371 |
-
progress
|
372 |
expanded_descriptions=expanded_products
|
373 |
)
|
374 |
except Exception as e:
|
@@ -377,7 +380,7 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
|
|
377 |
return f"<div style='color: red;'>Error comparing methods: {str(e)}<br><pre>{error_details}</pre></div>"
|
378 |
|
379 |
# Format results as HTML using centralized formatters
|
380 |
-
progress_tracker(0.9, desc="Formatting results")
|
381 |
result_elements = []
|
382 |
for product in product_names:
|
383 |
if product in comparison_results:
|
@@ -393,5 +396,5 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
|
|
393 |
header_text=header_text
|
394 |
)
|
395 |
|
396 |
-
progress_tracker(1.0, desc="Complete")
|
397 |
return output_html
|
|
|
8 |
from api_utils import process_in_parallel, rank_ingredients_openai
|
9 |
from ui_formatters import format_comparison_html, create_results_container
|
10 |
|
11 |
+
# from utils import SafeProgress # Removed SafeProgress import
|
12 |
from chicory_api import call_chicory_parser
|
13 |
from embeddings import create_product_embeddings
|
14 |
from similarity import compute_similarities
|
|
|
16 |
def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str, Any],
|
17 |
embedding_top_n: int = 20, final_top_n: int = 3,
|
18 |
confidence_threshold: float = 0.5, match_type="ingredients",
|
19 |
+
expanded_descriptions=None) -> Dict[str, Dict[str, List[Tuple]]]: # Removed progress parameter
|
20 |
"""
|
21 |
Compare multiple ingredient/category matching methods on the same products
|
22 |
|
|
|
43 |
else:
|
44 |
print(f"WARNING: First product '{products[0] if products else 'None'}' not found in expanded descriptions")
|
45 |
|
46 |
+
# Removed Gradio progress tracking
|
47 |
+
# progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
|
48 |
|
49 |
# Step 1: Generate embeddings for all products (used by multiple methods)
|
50 |
+
# progress_tracker(0.1, desc="Generating product embeddings") # Removed progress
|
51 |
# Use expanded descriptions for embeddings if available
|
52 |
if expanded_descriptions:
|
53 |
expanded_product_texts = [expanded_descriptions.get(p, p) for p in products]
|
54 |
+
product_embeddings = create_product_embeddings(expanded_product_texts,
|
55 |
+
original_products=products) # Keep original product IDs, removed progress
|
56 |
else:
|
57 |
+
product_embeddings = create_product_embeddings(products) # Removed progress
|
58 |
|
59 |
# Step 2: Get embedding-based candidates for all products
|
60 |
+
# progress_tracker(0.2, desc="Finding embedding candidates") # Removed progress
|
61 |
similarities = compute_similarities(ingredients_dict, product_embeddings)
|
62 |
|
63 |
# Filter to top N candidates per product
|
|
|
66 |
embedding_results[product] = product_similarities[:embedding_top_n]
|
67 |
|
68 |
# Step 3: Process with Chicory Parser
|
69 |
+
# progress_tracker(0.3, desc="Running Chicory Parser") # Removed progress
|
70 |
# Import here to avoid circular imports
|
71 |
# from chicory_parser import parse_products
|
72 |
|
73 |
+
chicory_results = call_chicory_parser(products) # Removed progress
|
74 |
|
75 |
# Initialize result structure
|
76 |
comparison_results = {}
|
|
|
104 |
comparison_results[product]["chicory"] = chicory_matches
|
105 |
|
106 |
# Step 4: Process with Voyage AI
|
107 |
+
# progress_tracker(0.4, desc="Processing with Voyage AI") # Removed progress
|
108 |
|
109 |
# Define processing function for Voyage
|
110 |
def process_voyage(product):
|
|
|
157 |
|
158 |
# Ensure results are in the expected format
|
159 |
formatted_results = []
|
160 |
+
added_ids = set() # Keep track of added category IDs to avoid duplicates
|
161 |
for r in results[:final_top_n]:
|
162 |
if isinstance(r, dict) and "name" in r and "score" in r:
|
163 |
# Convert score to float to ensure type compatibility
|
164 |
try:
|
165 |
score = float(r["score"])
|
166 |
+
name = r["name"] # Extract name for check
|
167 |
if score >= confidence_threshold:
|
168 |
+
if name not in added_ids: # Check for duplicates
|
169 |
+
formatted_results.append((name, score))
|
170 |
+
added_ids.add(name) # Add ID to set
|
171 |
except (ValueError, TypeError):
|
172 |
print(f"Invalid score format in result: {r}")
|
173 |
elif isinstance(r, tuple) and len(r) >= 2:
|
|
|
182 |
name = r[0]
|
183 |
|
184 |
if score >= confidence_threshold:
|
185 |
+
if name not in added_ids: # Check for duplicates
|
186 |
+
formatted_results.append((name, score))
|
187 |
+
added_ids.add(name) # Add ID to set
|
188 |
except (ValueError, TypeError):
|
189 |
print(f"Invalid score format in tuple: {r}")
|
190 |
|
|
|
204 |
voyage_results = process_in_parallel(
|
205 |
items=products,
|
206 |
processor_func=process_voyage,
|
207 |
+
max_workers=min(20, len(products))
|
208 |
+
# Removed ALL progress tracking arguments
|
|
|
|
|
|
|
209 |
)
|
210 |
|
211 |
# Update comparison results with Voyage results
|
|
|
214 |
comparison_results[product]["voyage"] = results
|
215 |
|
216 |
# Step 5: Process with OpenAI
|
217 |
+
# progress_tracker(0.7, desc="Running OpenAI processing in parallel") # Removed progress
|
218 |
|
219 |
# Define processing function for OpenAI
|
220 |
def process_openai(product):
|
|
|
265 |
openai_results = process_in_parallel(
|
266 |
items=products,
|
267 |
processor_func=process_openai,
|
268 |
+
max_workers=min(20, len(products))
|
269 |
+
# Removed ALL progress tracking arguments
|
|
|
|
|
|
|
270 |
)
|
271 |
|
272 |
# Update comparison results with OpenAI results
|
|
|
304 |
|
305 |
method_results[method] = formatted_results
|
306 |
|
307 |
+
# progress_tracker(1.0, desc="Comparison complete") # Removed progress
|
308 |
return comparison_results
|
309 |
|
310 |
def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
|
311 |
final_top_n=3, confidence_threshold=0.5,
|
312 |
+
match_type="categories", use_expansion=False): # Removed progress parameter
|
313 |
"""
|
314 |
Compare multiple ingredient matching methods on the same products
|
315 |
|
|
|
325 |
Returns:
|
326 |
HTML formatted comparison results
|
327 |
"""
|
328 |
+
# from utils import SafeProgress # Removed SafeProgress import
|
329 |
+
from utils import load_embeddings
|
330 |
|
331 |
+
# Removed Gradio progress tracking
|
332 |
+
# progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
|
333 |
+
# progress_tracker(0.1, desc="Processing input")
|
334 |
|
335 |
# Split text input by lines and remove empty lines
|
336 |
if not product_input:
|
|
|
341 |
|
342 |
# Load appropriate embeddings based on match type
|
343 |
try:
|
344 |
+
# progress_tracker(0.2, desc="Loading embeddings") # Removed progress
|
345 |
if match_type == "ingredients":
|
346 |
embeddings_path = "data/ingredient_embeddings_voyageai.pkl"
|
347 |
embeddings_dict = load_embeddings(embeddings_path)
|
|
|
358 |
# Expand descriptions if requested
|
359 |
if use_expansion:
|
360 |
from openai_expansion import expand_product_descriptions
|
361 |
+
# progress_tracker(0.25, desc="Expanding product descriptions") # Removed progress
|
362 |
+
expanded_products = expand_product_descriptions(product_names) # Removed progress argument
|
363 |
# Add at beginning of results
|
364 |
header_text = f"Comparing {len(product_names)} products using multiple {match_type} matching methods WITH expanded descriptions."
|
365 |
|
366 |
+
# progress_tracker(0.3, desc="Comparing methods") # Removed progress
|
367 |
comparison_results = compare_ingredient_methods(
|
368 |
products=product_names,
|
369 |
ingredients_dict=embeddings_dict,
|
370 |
embedding_top_n=embedding_top_n,
|
371 |
final_top_n=final_top_n,
|
372 |
confidence_threshold=confidence_threshold,
|
373 |
+
match_type=match_type, # Added missing comma
|
374 |
+
# Removed progress argument
|
375 |
expanded_descriptions=expanded_products
|
376 |
)
|
377 |
except Exception as e:
|
|
|
380 |
return f"<div style='color: red;'>Error comparing methods: {str(e)}<br><pre>{error_details}</pre></div>"
|
381 |
|
382 |
# Format results as HTML using centralized formatters
|
383 |
+
# progress_tracker(0.9, desc="Formatting results") # Removed progress
|
384 |
result_elements = []
|
385 |
for product in product_names:
|
386 |
if product in comparison_results:
|
|
|
396 |
header_text=header_text
|
397 |
)
|
398 |
|
399 |
+
# progress_tracker(1.0, desc="Complete") # Removed progress
|
400 |
return output_html
|
embeddings.py
CHANGED
@@ -6,8 +6,9 @@ import time
|
|
6 |
import numpy as np
|
7 |
from concurrent.futures import ThreadPoolExecutor
|
8 |
|
9 |
-
#
|
10 |
-
voyageai.
|
|
|
11 |
|
12 |
def get_embeddings_batch(texts, model="voyage-3-large", batch_size=100):
|
13 |
"""Get embeddings for a list of texts in batches"""
|
|
|
6 |
import numpy as np
|
7 |
from concurrent.futures import ThreadPoolExecutor
|
8 |
|
9 |
+
# Voyage AI API key is now loaded via environment variable
|
10 |
+
# when voyageai.Client() is initialized (after load_dotenv runs)
|
11 |
+
# voyageai.api_key = os.getenv("VOYAGE_API_KEY") # Removed global setting
|
12 |
|
13 |
def get_embeddings_batch(texts, model="voyage-3-large", batch_size=100):
|
14 |
"""Get embeddings for a list of texts in batches"""
|
requirements.txt
CHANGED
@@ -1,6 +1,8 @@
|
|
1 |
voyageai
|
2 |
numpy
|
3 |
-
|
|
|
4 |
openai
|
5 |
requests
|
6 |
tqdm
|
|
|
|
1 |
voyageai
|
2 |
numpy
|
3 |
+
streamlit
|
4 |
+
pandas
|
5 |
openai
|
6 |
requests
|
7 |
tqdm
|
8 |
+
python-dotenv
|
ui.py
CHANGED
@@ -1,222 +1,258 @@
|
|
1 |
-
import
|
|
|
2 |
from comparison import compare_ingredient_methods_ui
|
3 |
-
|
4 |
-
# Import from our UI modules
|
5 |
-
from ui_core import embeddings, get_css, load_examples
|
6 |
from ui_ingredient_matching import categorize_products
|
7 |
from ui_category_matching import categorize_products_by_category
|
8 |
from ui_hybrid_matching import categorize_products_with_voyage_reranking
|
9 |
from ui_expanded_matching import categorize_products_with_openai_reranking
|
|
|
10 |
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
placeholder="Enter product names, one per line",
|
83 |
-
label="Product Names"
|
84 |
-
)
|
85 |
-
with gr.Row():
|
86 |
-
tab_expansion = gr.Checkbox(
|
87 |
-
value=False,
|
88 |
-
label="Use Description Expansion",
|
89 |
-
info="Expand product descriptions using AI before matching"
|
90 |
-
)
|
91 |
-
tab_emb_top_n = gr.Slider(1, 50, 20, step=1, label="Embedding Top N Results")
|
92 |
-
tab_top_n = gr.Slider(1, 10, 5, step=1, label="Final Top N Results")
|
93 |
-
tab_confidence = gr.Slider(0.1, 0.9, 0.5, label="Matching Threshold")
|
94 |
-
|
95 |
-
tab_match_type = gr.Radio(
|
96 |
-
choices=["categories", "ingredients"],
|
97 |
-
value=default_match,
|
98 |
-
label="Match Type",
|
99 |
-
info="Choose whether to match against ingredients or categories"
|
100 |
-
)
|
101 |
-
|
102 |
-
with gr.Row():
|
103 |
-
tab_examples_btn = gr.Button("Load Examples", variant="secondary")
|
104 |
-
tab_match_btn = gr.Button(f"Match using {tab_name}", variant="primary")
|
105 |
-
|
106 |
-
with gr.Column(scale=1):
|
107 |
-
# Results section
|
108 |
-
tab_output = gr.HTML(label=f"{tab_name} Results", elem_id="results-container")
|
109 |
-
|
110 |
-
# Connect button events
|
111 |
-
tab_match_btn.click(
|
112 |
-
fn=fn_name,
|
113 |
-
inputs=[tab_input, gr.State(False), tab_expansion, tab_emb_top_n,
|
114 |
-
tab_top_n, tab_confidence, tab_match_type],
|
115 |
-
outputs=[tab_output],
|
116 |
)
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
)
|
123 |
-
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
-
|
142 |
-
|
143 |
-
|
144 |
-
|
145 |
-
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
150 |
-
|
151 |
-
|
152 |
-
|
153 |
-
|
154 |
-
|
155 |
-
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
164 |
)
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
177 |
compare_embedding_top_n,
|
178 |
compare_final_top_n,
|
179 |
compare_confidence_threshold,
|
180 |
compare_match_type,
|
181 |
compare_expansion
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
|
188 |
-
|
189 |
-
inputs=[],
|
190 |
-
outputs=compare_product_input
|
191 |
-
)
|
192 |
-
|
193 |
-
# Connect buttons for ingredient matching
|
194 |
-
categorize_btn.click(
|
195 |
-
fn=categorize_products,
|
196 |
-
inputs=[text_input, gr.State(False), use_expansion, top_n, confidence],
|
197 |
-
outputs=[text_output],
|
198 |
-
)
|
199 |
-
|
200 |
-
# Connect buttons for category matching
|
201 |
-
match_categories_btn.click(
|
202 |
-
fn=categorize_products_by_category,
|
203 |
-
inputs=[category_text_input, gr.State(False), category_use_expansion, category_top_n, category_confidence],
|
204 |
-
outputs=[category_output],
|
205 |
-
)
|
206 |
-
|
207 |
-
# Examples buttons for the first two tabs
|
208 |
-
examples_btn.click(
|
209 |
-
fn=load_examples,
|
210 |
-
inputs=[],
|
211 |
-
outputs=text_input
|
212 |
-
)
|
213 |
-
|
214 |
-
category_examples_btn.click(
|
215 |
-
fn=load_examples, # Reuse the same examples
|
216 |
-
inputs=[],
|
217 |
-
outputs=category_text_input
|
218 |
-
)
|
219 |
-
|
220 |
-
gr.Markdown("Powered by Voyage AI embeddings • Built with Gradio")
|
221 |
-
|
222 |
-
return demo
|
|
|
1 |
+
import streamlit as st
|
2 |
+
import pandas as pd
|
3 |
from comparison import compare_ingredient_methods_ui
|
4 |
+
from ui_core import embeddings, load_examples
|
|
|
|
|
5 |
from ui_ingredient_matching import categorize_products
|
6 |
from ui_category_matching import categorize_products_by_category
|
7 |
from ui_hybrid_matching import categorize_products_with_voyage_reranking
|
8 |
from ui_expanded_matching import categorize_products_with_openai_reranking
|
9 |
+
# Removed unused import: from ui_formatters import format_results_html
|
10 |
|
11 |
+
# Initialize session state keys if they don't exist
|
12 |
+
if 'ingredient_input' not in st.session_state:
|
13 |
+
st.session_state.ingredient_input = ""
|
14 |
+
if 'category_input' not in st.session_state:
|
15 |
+
st.session_state.category_input = ""
|
16 |
+
if 'voyage_input' not in st.session_state:
|
17 |
+
st.session_state.voyage_input = ""
|
18 |
+
if 'openai_input' not in st.session_state:
|
19 |
+
st.session_state.openai_input = ""
|
20 |
+
if 'compare_input' not in st.session_state:
|
21 |
+
st.session_state.compare_input = ""
|
22 |
+
|
23 |
+
|
24 |
+
def render_ui():
|
25 |
+
"""Render the Streamlit interface"""
|
26 |
+
# Page config is now set in app.py
|
27 |
+
st.title("Product Categorization Tool")
|
28 |
+
st.markdown("Analyze products by matching to ingredients or categories using AI embeddings.")
|
29 |
+
|
30 |
+
# Use st.tabs for the different sections
|
31 |
+
tab_ingredient, tab_category, tab_voyage, tab_openai, tab_compare = st.tabs([
|
32 |
+
"Ingredient Embeddings",
|
33 |
+
"Category Embeddings",
|
34 |
+
"Voyage AI Reranking",
|
35 |
+
"OpenAI Reranking",
|
36 |
+
"Compare Methods"
|
37 |
+
])
|
38 |
+
|
39 |
+
# --- Ingredient Matching Tab ---
|
40 |
+
with tab_ingredient:
|
41 |
+
st.header("Match Products to Ingredients")
|
42 |
+
col1, col2 = st.columns(2)
|
43 |
+
with col1:
|
44 |
+
# Handle button click *before* rendering the text area
|
45 |
+
if st.button("Load Examples", key="ingredient_examples"):
|
46 |
+
st.session_state.ingredient_input = load_examples() # Update state for next rerun
|
47 |
+
|
48 |
+
# Input section - Use the session state value
|
49 |
+
text_input = st.text_area(
|
50 |
+
"Product Names (one per line)",
|
51 |
+
value=st.session_state.ingredient_input, # Use value from state
|
52 |
+
placeholder="Enter product names, one per line",
|
53 |
+
height=250,
|
54 |
+
key="ingredient_input_widget" # Use a different key for the widget itself if needed, or manage via value
|
55 |
+
)
|
56 |
+
# Update session state if user types manually
|
57 |
+
st.session_state.ingredient_input = text_input
|
58 |
+
|
59 |
+
use_expansion = st.checkbox(
|
60 |
+
"Use Description Expansion (AI)",
|
61 |
+
value=False,
|
62 |
+
key="ingredient_expansion",
|
63 |
+
help="Expand product descriptions using AI before matching"
|
64 |
+
)
|
65 |
+
top_n = st.slider("Top N Results", 1, 25, 10, step=1, key="ingredient_top_n")
|
66 |
+
confidence = st.slider("Similarity Threshold", 0.1, 0.9, 0.5, step=0.05, key="ingredient_confidence")
|
67 |
+
|
68 |
+
find_ingredients_btn = st.button("Find Similar Ingredients", type="primary", key="ingredient_find")
|
69 |
+
|
70 |
+
with col2:
|
71 |
+
# Results section
|
72 |
+
st.subheader("Results")
|
73 |
+
results_placeholder_ingredient = st.empty()
|
74 |
+
if find_ingredients_btn:
|
75 |
+
if st.session_state.ingredient_input: # Check state value
|
76 |
+
results_html = categorize_products(
|
77 |
+
st.session_state.ingredient_input,
|
78 |
+
False,
|
79 |
+
use_expansion,
|
80 |
+
top_n,
|
81 |
+
confidence
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
)
|
83 |
+
results_placeholder_ingredient.markdown(results_html, unsafe_allow_html=True)
|
84 |
+
else:
|
85 |
+
results_placeholder_ingredient.warning("Please enter product names.")
|
86 |
+
|
87 |
+
# --- Category Matching Tab ---
|
88 |
+
with tab_category:
|
89 |
+
st.header("Match Products to Categories")
|
90 |
+
col1, col2 = st.columns(2)
|
91 |
+
with col1:
|
92 |
+
if st.button("Load Examples", key="category_examples"):
|
93 |
+
st.session_state.category_input = load_examples()
|
94 |
+
|
95 |
+
category_text_input = st.text_area(
|
96 |
+
"Product Names (one per line)",
|
97 |
+
value=st.session_state.category_input,
|
98 |
+
placeholder="Enter product names, one per line",
|
99 |
+
height=250,
|
100 |
+
key="category_input_widget"
|
101 |
+
)
|
102 |
+
st.session_state.category_input = category_text_input
|
103 |
+
|
104 |
+
category_use_expansion = st.checkbox(
|
105 |
+
"Use Description Expansion (AI)",
|
106 |
+
value=False,
|
107 |
+
key="category_expansion",
|
108 |
+
help="Expand product descriptions using AI before matching"
|
109 |
+
)
|
110 |
+
category_top_n = st.slider("Top N Categories", 1, 10, 5, step=1, key="category_top_n")
|
111 |
+
category_confidence = st.slider("Matching Threshold", 0.1, 0.9, 0.5, step=0.05, key="category_confidence")
|
112 |
+
|
113 |
+
match_categories_btn = st.button("Match to Categories", type="primary", key="category_match")
|
114 |
+
|
115 |
+
with col2:
|
116 |
+
st.subheader("Results")
|
117 |
+
results_placeholder_category = st.empty()
|
118 |
+
if match_categories_btn:
|
119 |
+
if st.session_state.category_input:
|
120 |
+
results_html = categorize_products_by_category(
|
121 |
+
st.session_state.category_input,
|
122 |
+
False,
|
123 |
+
category_use_expansion,
|
124 |
+
category_top_n,
|
125 |
+
category_confidence
|
126 |
)
|
127 |
+
results_placeholder_category.markdown(results_html, unsafe_allow_html=True)
|
128 |
+
else:
|
129 |
+
results_placeholder_category.warning("Please enter product names.")
|
130 |
+
|
131 |
+
# --- Common function for Reranking Tabs ---
|
132 |
+
def create_reranking_ui(tab, tab_key_prefix, tab_name, backend_function, default_match="categories"):
|
133 |
+
with tab:
|
134 |
+
st.header(f"Match using {tab_name}")
|
135 |
+
col1, col2 = st.columns(2)
|
136 |
+
with col1:
|
137 |
+
if st.button("Load Examples", key=f"{tab_key_prefix}_examples"):
|
138 |
+
st.session_state[f"{tab_key_prefix}_input"] = load_examples()
|
139 |
+
|
140 |
+
tab_input_value = st.text_area(
|
141 |
+
"Product Names (one per line)",
|
142 |
+
value=st.session_state[f"{tab_key_prefix}_input"],
|
143 |
+
placeholder="Enter product names, one per line",
|
144 |
+
height=250,
|
145 |
+
key=f"{tab_key_prefix}_input_widget"
|
146 |
+
)
|
147 |
+
st.session_state[f"{tab_key_prefix}_input"] = tab_input_value # Update state
|
148 |
+
|
149 |
+
tab_expansion = st.checkbox(
|
150 |
+
"Use Description Expansion (AI)",
|
151 |
+
value=False,
|
152 |
+
key=f"{tab_key_prefix}_expansion",
|
153 |
+
help="Expand product descriptions using AI before matching"
|
154 |
+
)
|
155 |
+
tab_emb_top_n = st.slider("Embedding Top N Results", 1, 50, 20, step=1, key=f"{tab_key_prefix}_emb_top_n")
|
156 |
+
tab_top_n = st.slider("Final Top N Results", 1, 10, 5, step=1, key=f"{tab_key_prefix}_final_top_n")
|
157 |
+
tab_confidence = st.slider("Matching Threshold", 0.1, 0.9, 0.5, step=0.05, key=f"{tab_key_prefix}_confidence")
|
158 |
+
tab_match_type = st.radio(
|
159 |
+
"Match Type",
|
160 |
+
options=["categories", "ingredients"],
|
161 |
+
index=0 if default_match == "categories" else 1,
|
162 |
+
key=f"{tab_key_prefix}_match_type",
|
163 |
+
horizontal=True,
|
164 |
+
help="Choose whether to match against ingredients or categories"
|
165 |
+
)
|
166 |
+
|
167 |
+
tab_match_btn = st.button(f"Match using {tab_name}", type="primary", key=f"{tab_key_prefix}_match")
|
168 |
+
|
169 |
+
with col2:
|
170 |
+
st.subheader("Results")
|
171 |
+
results_placeholder_rerank = st.empty()
|
172 |
+
if tab_match_btn:
|
173 |
+
if st.session_state[f"{tab_key_prefix}_input"]:
|
174 |
+
results_html = backend_function(
|
175 |
+
st.session_state[f"{tab_key_prefix}_input"],
|
176 |
+
False,
|
177 |
+
tab_expansion,
|
178 |
+
tab_emb_top_n,
|
179 |
+
tab_top_n,
|
180 |
+
tab_confidence,
|
181 |
+
tab_match_type
|
182 |
)
|
183 |
+
results_placeholder_rerank.markdown(results_html, unsafe_allow_html=True)
|
184 |
+
else:
|
185 |
+
results_placeholder_rerank.warning("Please enter product names.")
|
186 |
+
|
187 |
+
# Create the reranking tabs
|
188 |
+
create_reranking_ui(tab_voyage, "voyage", "Voyage AI Reranking", categorize_products_with_voyage_reranking, "categories")
|
189 |
+
create_reranking_ui(tab_openai, "openai", "OpenAI Reranking", categorize_products_with_openai_reranking, "categories")
|
190 |
+
|
191 |
+
# --- Compare Methods Tab ---
|
192 |
+
with tab_compare:
|
193 |
+
st.header("Compare Matching Methods")
|
194 |
+
col1, col2 = st.columns(2)
|
195 |
+
with col1:
|
196 |
+
if st.button("Load Examples", key="compare_examples"):
|
197 |
+
st.session_state.compare_input = load_examples()
|
198 |
+
|
199 |
+
compare_product_input_value = st.text_area(
|
200 |
+
"Product Names (one per line)",
|
201 |
+
value=st.session_state.compare_input,
|
202 |
+
placeholder="4 Tbsp sweet pickle relish\nchocolate chips\nfresh parsley",
|
203 |
+
height=200,
|
204 |
+
key="compare_input_widget"
|
205 |
+
)
|
206 |
+
st.session_state.compare_input = compare_product_input_value # Update state
|
207 |
+
|
208 |
+
compare_embedding_top_n = st.slider(
|
209 |
+
"Initial embedding candidates",
|
210 |
+
min_value=5, max_value=50, value=20, step=5,
|
211 |
+
key="compare_emb_top_n"
|
212 |
+
)
|
213 |
+
compare_final_top_n = st.slider(
|
214 |
+
"Final results per method",
|
215 |
+
min_value=1, max_value=10, value=3, step=1,
|
216 |
+
key="compare_final_top_n"
|
217 |
+
)
|
218 |
+
compare_confidence_threshold = st.slider(
|
219 |
+
"Confidence threshold",
|
220 |
+
min_value=0.0, max_value=1.0, value=0.5, step=0.05,
|
221 |
+
key="compare_confidence"
|
222 |
+
)
|
223 |
+
compare_match_type = st.radio(
|
224 |
+
"Match Type",
|
225 |
+
options=["categories", "ingredients"],
|
226 |
+
index=0,
|
227 |
+
key="compare_match_type",
|
228 |
+
horizontal=True,
|
229 |
+
help="Choose whether to match against ingredients or categories"
|
230 |
+
)
|
231 |
+
compare_expansion = st.checkbox(
|
232 |
+
"Use Description Expansion (AI)",
|
233 |
+
value=False,
|
234 |
+
key="compare_expansion",
|
235 |
+
help="Expand product descriptions using AI before matching"
|
236 |
+
)
|
237 |
+
|
238 |
+
compare_btn = st.button("Compare Methods", type="primary", key="compare_run")
|
239 |
+
|
240 |
+
with col2:
|
241 |
+
st.subheader("Comparison Results")
|
242 |
+
results_placeholder_compare = st.empty()
|
243 |
+
if compare_btn:
|
244 |
+
if st.session_state.compare_input:
|
245 |
+
results_html = compare_ingredient_methods_ui(
|
246 |
+
st.session_state.compare_input,
|
247 |
compare_embedding_top_n,
|
248 |
compare_final_top_n,
|
249 |
compare_confidence_threshold,
|
250 |
compare_match_type,
|
251 |
compare_expansion
|
252 |
+
)
|
253 |
+
results_placeholder_compare.markdown(results_html, unsafe_allow_html=True)
|
254 |
+
else:
|
255 |
+
results_placeholder_compare.warning("Please enter product names.")
|
256 |
+
|
257 |
+
st.markdown("---")
|
258 |
+
st.markdown("Powered by Voyage AI embeddings • Built with Streamlit")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ui_category_matching.py
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
-
import gradio as gr
|
2 |
-
from utils import SafeProgress
|
3 |
from category_matching import load_categories, match_products_to_categories
|
4 |
from ui_core import parse_input
|
5 |
from ui_formatters import format_categories_html
|
@@ -8,8 +8,9 @@ from openai_expansion import expand_product_descriptions
|
|
8 |
def categorize_products_by_category(product_input, is_file=False, use_expansion=False, top_n=10, confidence_threshold=0.5):
|
9 |
|
10 |
"""Categorize products by matching them to predefined categories"""
|
11 |
-
|
12 |
-
progress_tracker
|
|
|
13 |
|
14 |
# Parse input
|
15 |
product_names, error = parse_input(product_input, is_file)
|
@@ -19,15 +20,15 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
|
|
19 |
# Optional description expansion
|
20 |
expanded_descriptions = {}
|
21 |
if use_expansion:
|
22 |
-
progress_tracker(0.1, desc="Expanding product descriptions...")
|
23 |
-
expanded_descriptions = expand_product_descriptions(product_names
|
24 |
# Use expanded descriptions for matching if available
|
25 |
products_to_match = [expanded_descriptions.get(p, p) for p in product_names]
|
26 |
else:
|
27 |
products_to_match = product_names
|
28 |
|
29 |
# Load categories
|
30 |
-
progress_tracker(0.2, desc="Loading categories...")
|
31 |
categories = load_categories()
|
32 |
|
33 |
# Create a mapping from original product names to expanded versions
|
@@ -37,13 +38,13 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
|
|
37 |
product_to_expanded[product] = products_to_match[i]
|
38 |
|
39 |
# Match products to categories
|
40 |
-
progress_tracker(0.3, desc="Matching products to categories...")
|
41 |
match_results = match_products_to_categories(
|
42 |
products_to_match,
|
43 |
categories,
|
44 |
top_n=int(top_n),
|
45 |
-
confidence_threshold=confidence_threshold
|
46 |
-
progress
|
47 |
)
|
48 |
|
49 |
# Create a new dictionary mapping original product names to their results
|
@@ -53,7 +54,7 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
|
|
53 |
original_product_results[product] = match_results[expanded]
|
54 |
|
55 |
# Format results
|
56 |
-
progress_tracker(0.9, desc="Formatting results...")
|
57 |
output_html = "<div style='font-family: Arial, sans-serif; max-width: 100%; overflow-x: auto;'>"
|
58 |
output_html += f"<p style='color: #555;'>Matched {len(product_names)} products to categories.</p>"
|
59 |
|
@@ -75,5 +76,5 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
|
|
75 |
if not match_results:
|
76 |
output_html = "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>No results found. Please check your input or try different products.</div>"
|
77 |
|
78 |
-
progress_tracker(1.0, desc="Done!")
|
79 |
return output_html
|
|
|
1 |
+
# import gradio as gr # Removed Gradio import
|
2 |
+
# from utils import SafeProgress # Removed SafeProgress import
|
3 |
from category_matching import load_categories, match_products_to_categories
|
4 |
from ui_core import parse_input
|
5 |
from ui_formatters import format_categories_html
|
|
|
8 |
def categorize_products_by_category(product_input, is_file=False, use_expansion=False, top_n=10, confidence_threshold=0.5):
|
9 |
|
10 |
"""Categorize products by matching them to predefined categories"""
|
11 |
+
# Removed Gradio progress tracking
|
12 |
+
# progress_tracker = SafeProgress(gr.Progress())
|
13 |
+
# progress_tracker(0, desc="Starting categorization...")
|
14 |
|
15 |
# Parse input
|
16 |
product_names, error = parse_input(product_input, is_file)
|
|
|
20 |
# Optional description expansion
|
21 |
expanded_descriptions = {}
|
22 |
if use_expansion:
|
23 |
+
# progress_tracker(0.1, desc="Expanding product descriptions...") # Removed progress
|
24 |
+
expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
|
25 |
# Use expanded descriptions for matching if available
|
26 |
products_to_match = [expanded_descriptions.get(p, p) for p in product_names]
|
27 |
else:
|
28 |
products_to_match = product_names
|
29 |
|
30 |
# Load categories
|
31 |
+
# progress_tracker(0.2, desc="Loading categories...") # Removed progress
|
32 |
categories = load_categories()
|
33 |
|
34 |
# Create a mapping from original product names to expanded versions
|
|
|
38 |
product_to_expanded[product] = products_to_match[i]
|
39 |
|
40 |
# Match products to categories
|
41 |
+
# progress_tracker(0.3, desc="Matching products to categories...") # Removed progress
|
42 |
match_results = match_products_to_categories(
|
43 |
products_to_match,
|
44 |
categories,
|
45 |
top_n=int(top_n),
|
46 |
+
confidence_threshold=confidence_threshold
|
47 |
+
# Removed progress argument
|
48 |
)
|
49 |
|
50 |
# Create a new dictionary mapping original product names to their results
|
|
|
54 |
original_product_results[product] = match_results[expanded]
|
55 |
|
56 |
# Format results
|
57 |
+
# progress_tracker(0.9, desc="Formatting results...") # Removed progress
|
58 |
output_html = "<div style='font-family: Arial, sans-serif; max-width: 100%; overflow-x: auto;'>"
|
59 |
output_html += f"<p style='color: #555;'>Matched {len(product_names)} products to categories.</p>"
|
60 |
|
|
|
76 |
if not match_results:
|
77 |
output_html = "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>No results found. Please check your input or try different products.</div>"
|
78 |
|
79 |
+
# progress_tracker(1.0, desc="Done!") # Removed progress
|
80 |
return output_html
|
ui_expanded_matching.py
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
-
import gradio as gr
|
2 |
-
from utils import SafeProgress
|
3 |
from embeddings import create_product_embeddings
|
4 |
from similarity import compute_similarities
|
5 |
from openai_expansion import expand_product_descriptions
|
@@ -11,12 +11,13 @@ import json
|
|
11 |
|
12 |
def categorize_products_with_openai_reranking(product_input, is_file=False, use_expansion=False,
|
13 |
embedding_top_n=20, top_n=10, confidence_threshold=0.5,
|
14 |
-
match_type="ingredients"
|
15 |
"""
|
16 |
Categorize products using OpenAI reranking with optional description expansion
|
17 |
"""
|
18 |
-
|
19 |
-
progress_tracker
|
|
|
20 |
# Parse input
|
21 |
product_names, error = parse_input(product_input, is_file)
|
22 |
if error:
|
@@ -28,8 +29,8 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
28 |
# Optional description expansion
|
29 |
expanded_descriptions = {}
|
30 |
if use_expansion:
|
31 |
-
progress_tracker(0.2, desc="Expanding product descriptions...")
|
32 |
-
expanded_descriptions = expand_product_descriptions(product_names
|
33 |
|
34 |
# Get shared OpenAI client
|
35 |
openai_client = get_openai_client()
|
@@ -38,13 +39,13 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
38 |
|
39 |
if match_type == "ingredients":
|
40 |
# Generate product embeddings
|
41 |
-
progress_tracker(0.4, desc="Generating product embeddings...")
|
42 |
if use_expansion and expanded_descriptions:
|
43 |
# Use expanded descriptions for embedding creation when available
|
44 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
|
45 |
# Map expanded descriptions back to original product names for consistent keys
|
46 |
product_embeddings = {}
|
47 |
-
temp_embeddings = create_product_embeddings(products_for_embedding,
|
48 |
|
49 |
# Ensure we use original product names as keys
|
50 |
for i, product_name in enumerate(product_names):
|
@@ -52,10 +53,10 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
52 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
53 |
else:
|
54 |
# Standard embedding creation with just product names
|
55 |
-
product_embeddings = create_product_embeddings(product_names
|
56 |
|
57 |
# Compute embedding similarities for ingredients
|
58 |
-
progress_tracker(0.6, desc="Computing ingredient similarities...")
|
59 |
all_similarities = compute_similarities(embeddings, product_embeddings)
|
60 |
|
61 |
print(f"product_names: {product_names}")
|
@@ -65,7 +66,7 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
65 |
if not all_similarities:
|
66 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No similarities found. Please try different product names.</div>"
|
67 |
|
68 |
-
progress_tracker(0.7, desc="Re-ranking with OpenAI...")
|
69 |
|
70 |
# Function for processing each product
|
71 |
def process_reranking(product):
|
@@ -104,29 +105,26 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
104 |
final_results = process_in_parallel(
|
105 |
items=product_names,
|
106 |
processor_func=process_reranking,
|
107 |
-
max_workers=min(10, len(product_names))
|
108 |
-
|
109 |
-
|
110 |
-
progress_end=0.9,
|
111 |
-
progress_desc="Re-ranking"
|
112 |
-
)
|
113 |
|
114 |
else: # categories
|
115 |
# Load category embeddings instead of JSON categories
|
116 |
-
progress_tracker(0.5, desc="Loading category embeddings...")
|
117 |
category_embeddings = load_category_embeddings()
|
118 |
|
119 |
if not category_embeddings:
|
120 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No category embeddings found. Please check that the embeddings file exists at data/category_embeddings.pickle.</div>"
|
121 |
|
122 |
# Generate product embeddings
|
123 |
-
progress_tracker(0.6, desc="Generating product embeddings...")
|
124 |
if use_expansion and expanded_descriptions:
|
125 |
# Use expanded descriptions for embedding creation when available
|
126 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
|
127 |
# Map expanded descriptions back to original product names for consistent keys
|
128 |
product_embeddings = {}
|
129 |
-
temp_embeddings = create_product_embeddings(products_for_embedding,
|
130 |
|
131 |
# Ensure we use original product names as keys
|
132 |
for i, product_name in enumerate(product_names):
|
@@ -134,10 +132,10 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
134 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
135 |
else:
|
136 |
# Standard embedding creation with just product names
|
137 |
-
product_embeddings = create_product_embeddings(product_names
|
138 |
|
139 |
# Compute embedding similarities for categories
|
140 |
-
progress_tracker(0.7, desc="Computing category similarities...")
|
141 |
all_similarities = compute_similarities(category_embeddings, product_embeddings)
|
142 |
|
143 |
if not all_similarities:
|
@@ -150,7 +148,7 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
150 |
needed_category_ids.add(category_id)
|
151 |
|
152 |
# Load only the needed categories from JSON
|
153 |
-
progress_tracker(0.75, desc="Loading category descriptions...")
|
154 |
category_descriptions = {}
|
155 |
if needed_category_ids:
|
156 |
try:
|
@@ -211,15 +209,12 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
211 |
final_results = process_in_parallel(
|
212 |
items=product_names,
|
213 |
processor_func=process_category_matching,
|
214 |
-
max_workers=min(10, len(product_names))
|
215 |
-
|
216 |
-
|
217 |
-
progress_end=0.9,
|
218 |
-
progress_desc="Category matching"
|
219 |
-
)
|
220 |
|
221 |
# Format results
|
222 |
-
progress_tracker(0.9, desc="Formatting results...")
|
223 |
|
224 |
# Create a list of result dictionaries in consistent format
|
225 |
formatted_results = []
|
@@ -259,5 +254,5 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
|
|
259 |
confidence_threshold=confidence_threshold # Pass the threshold to the formatter
|
260 |
)
|
261 |
|
262 |
-
progress_tracker(1.0, desc="Done!")
|
263 |
return result_html
|
|
|
1 |
+
# import gradio as gr # Removed Gradio import
|
2 |
+
# from utils import SafeProgress # Removed SafeProgress import
|
3 |
from embeddings import create_product_embeddings
|
4 |
from similarity import compute_similarities
|
5 |
from openai_expansion import expand_product_descriptions
|
|
|
11 |
|
12 |
def categorize_products_with_openai_reranking(product_input, is_file=False, use_expansion=False,
|
13 |
embedding_top_n=20, top_n=10, confidence_threshold=0.5,
|
14 |
+
match_type="ingredients"): # Removed progress parameter
|
15 |
"""
|
16 |
Categorize products using OpenAI reranking with optional description expansion
|
17 |
"""
|
18 |
+
# Removed Gradio progress tracking
|
19 |
+
# progress_tracker = SafeProgress(progress)
|
20 |
+
# progress_tracker(0, desc="Starting OpenAI reranking...")
|
21 |
# Parse input
|
22 |
product_names, error = parse_input(product_input, is_file)
|
23 |
if error:
|
|
|
29 |
# Optional description expansion
|
30 |
expanded_descriptions = {}
|
31 |
if use_expansion:
|
32 |
+
# progress_tracker(0.2, desc="Expanding product descriptions...") # Removed progress
|
33 |
+
expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
|
34 |
|
35 |
# Get shared OpenAI client
|
36 |
openai_client = get_openai_client()
|
|
|
39 |
|
40 |
if match_type == "ingredients":
|
41 |
# Generate product embeddings
|
42 |
+
# progress_tracker(0.4, desc="Generating product embeddings...") # Removed progress
|
43 |
if use_expansion and expanded_descriptions:
|
44 |
# Use expanded descriptions for embedding creation when available
|
45 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
|
46 |
# Map expanded descriptions back to original product names for consistent keys
|
47 |
product_embeddings = {}
|
48 |
+
temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names
|
49 |
|
50 |
# Ensure we use original product names as keys
|
51 |
for i, product_name in enumerate(product_names):
|
|
|
53 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
54 |
else:
|
55 |
# Standard embedding creation with just product names
|
56 |
+
product_embeddings = create_product_embeddings(product_names) # Removed progress
|
57 |
|
58 |
# Compute embedding similarities for ingredients
|
59 |
+
# progress_tracker(0.6, desc="Computing ingredient similarities...") # Removed progress
|
60 |
all_similarities = compute_similarities(embeddings, product_embeddings)
|
61 |
|
62 |
print(f"product_names: {product_names}")
|
|
|
66 |
if not all_similarities:
|
67 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No similarities found. Please try different product names.</div>"
|
68 |
|
69 |
+
# progress_tracker(0.7, desc="Re-ranking with OpenAI...") # Removed progress
|
70 |
|
71 |
# Function for processing each product
|
72 |
def process_reranking(product):
|
|
|
105 |
final_results = process_in_parallel(
|
106 |
items=product_names,
|
107 |
processor_func=process_reranking,
|
108 |
+
max_workers=min(10, len(product_names)) # Moved max_workers inside
|
109 |
+
# Removed progress tracking arguments
|
110 |
+
) # Corrected closing parenthesis
|
|
|
|
|
|
|
111 |
|
112 |
else: # categories
|
113 |
# Load category embeddings instead of JSON categories
|
114 |
+
# progress_tracker(0.5, desc="Loading category embeddings...") # Removed progress
|
115 |
category_embeddings = load_category_embeddings()
|
116 |
|
117 |
if not category_embeddings:
|
118 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No category embeddings found. Please check that the embeddings file exists at data/category_embeddings.pickle.</div>"
|
119 |
|
120 |
# Generate product embeddings
|
121 |
+
# progress_tracker(0.6, desc="Generating product embeddings...") # Removed progress
|
122 |
if use_expansion and expanded_descriptions:
|
123 |
# Use expanded descriptions for embedding creation when available
|
124 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
|
125 |
# Map expanded descriptions back to original product names for consistent keys
|
126 |
product_embeddings = {}
|
127 |
+
temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names
|
128 |
|
129 |
# Ensure we use original product names as keys
|
130 |
for i, product_name in enumerate(product_names):
|
|
|
132 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
133 |
else:
|
134 |
# Standard embedding creation with just product names
|
135 |
+
product_embeddings = create_product_embeddings(product_names) # Removed progress
|
136 |
|
137 |
# Compute embedding similarities for categories
|
138 |
+
# progress_tracker(0.7, desc="Computing category similarities...") # Removed progress
|
139 |
all_similarities = compute_similarities(category_embeddings, product_embeddings)
|
140 |
|
141 |
if not all_similarities:
|
|
|
148 |
needed_category_ids.add(category_id)
|
149 |
|
150 |
# Load only the needed categories from JSON
|
151 |
+
# progress_tracker(0.75, desc="Loading category descriptions...") # Removed progress
|
152 |
category_descriptions = {}
|
153 |
if needed_category_ids:
|
154 |
try:
|
|
|
209 |
final_results = process_in_parallel(
|
210 |
items=product_names,
|
211 |
processor_func=process_category_matching,
|
212 |
+
max_workers=min(10, len(product_names)) # Restored max_workers inside the call
|
213 |
+
# Removed progress tracking arguments
|
214 |
+
) # Correctly placed closing parenthesis
|
|
|
|
|
|
|
215 |
|
216 |
# Format results
|
217 |
+
# progress_tracker(0.9, desc="Formatting results...") # Removed progress
|
218 |
|
219 |
# Create a list of result dictionaries in consistent format
|
220 |
formatted_results = []
|
|
|
254 |
confidence_threshold=confidence_threshold # Pass the threshold to the formatter
|
255 |
)
|
256 |
|
257 |
+
# progress_tracker(1.0, desc="Done!") # Removed progress
|
258 |
return result_html
|
ui_hybrid_matching.py
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
-
import gradio as gr
|
2 |
-
from utils import SafeProgress
|
3 |
from category_matching import load_categories, hybrid_category_matching
|
4 |
from similarity import hybrid_ingredient_matching, compute_similarities
|
5 |
from ui_core import embeddings, parse_input
|
@@ -9,12 +9,13 @@ from api_utils import get_voyage_client
|
|
9 |
|
10 |
def categorize_products_with_voyage_reranking(product_input, is_file=False, use_expansion=False,
|
11 |
embedding_top_n=20, final_top_n=5, confidence_threshold=0.5,
|
12 |
-
match_type="categories"
|
13 |
"""
|
14 |
Categorize products using Voyage reranking with optional description expansion
|
15 |
"""
|
16 |
-
|
17 |
-
progress_tracker
|
|
|
18 |
|
19 |
# Parse input
|
20 |
product_names, error = parse_input(product_input, is_file)
|
@@ -24,24 +25,24 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
|
|
24 |
# Optional description expansion
|
25 |
expanded_descriptions = {}
|
26 |
if use_expansion:
|
27 |
-
progress_tracker(0.3, desc="Expanding product descriptions...")
|
28 |
-
expanded_descriptions = expand_product_descriptions(product_names
|
29 |
|
30 |
match_results = {}
|
31 |
if match_type == "categories":
|
32 |
# Load categories
|
33 |
-
progress_tracker(0.2, desc="Loading categories...")
|
34 |
categories = load_categories()
|
35 |
|
36 |
# Use hybrid approach for categories with optional expanded descriptions
|
37 |
-
progress_tracker(0.5, desc="Finding and re-ranking categories...")
|
38 |
match_results = hybrid_category_matching(
|
39 |
product_names, categories,
|
40 |
embedding_top_n=int(embedding_top_n),
|
41 |
-
final_top_n=int(final_top_n),
|
42 |
confidence_threshold=0.0, # Don't apply threshold here - do it in display
|
43 |
-
expanded_descriptions=expanded_descriptions if use_expansion else None
|
44 |
-
progress
|
45 |
)
|
46 |
else: # ingredients
|
47 |
# Validate embeddings are loaded
|
@@ -49,18 +50,18 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
|
|
49 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No ingredient embeddings loaded. Please check that the embeddings file exists and is properly formatted.</div>"
|
50 |
|
51 |
# Use hybrid approach for ingredients with optional expanded descriptions
|
52 |
-
progress_tracker(0.5, desc="Finding and re-ranking ingredients...")
|
53 |
match_results = hybrid_ingredient_matching(
|
54 |
product_names, embeddings,
|
55 |
embedding_top_n=int(embedding_top_n),
|
56 |
-
final_top_n=int(final_top_n),
|
57 |
confidence_threshold=0.0, # Don't apply threshold here - do it in display
|
58 |
-
expanded_descriptions=expanded_descriptions if use_expansion else None
|
59 |
-
progress
|
60 |
)
|
61 |
|
62 |
# Format results
|
63 |
-
progress_tracker(0.9, desc="Formatting results...")
|
64 |
|
65 |
# Convert to unified format for formatter
|
66 |
formatted_results = []
|
@@ -109,7 +110,7 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
|
|
109 |
confidence_threshold=confidence_threshold # Pass the threshold to the formatter
|
110 |
)
|
111 |
|
112 |
-
progress_tracker(1.0, desc="Done!")
|
113 |
return result_html
|
114 |
|
115 |
# Update the function in ui_hybrid_matching.py
|
@@ -117,13 +118,14 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
|
|
117 |
embedding_top_n=20, final_top_n=5,
|
118 |
confidence_threshold=0.5,
|
119 |
expanded_descriptions=None,
|
120 |
-
|
121 |
"""Use Voyage AI for reranking instead of OpenAI"""
|
122 |
-
from utils import SafeProgress
|
123 |
from embeddings import create_product_embeddings
|
124 |
|
125 |
-
|
126 |
-
progress_tracker(
|
|
|
127 |
|
128 |
# Stage 1: Same as before - use embeddings to find candidates
|
129 |
if expanded_descriptions:
|
@@ -131,7 +133,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
|
|
131 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
|
132 |
# Map expanded descriptions back to original product names for consistent keys
|
133 |
product_embeddings = {}
|
134 |
-
temp_embeddings = create_product_embeddings(products_for_embedding,
|
135 |
|
136 |
# Ensure we use original product names as keys
|
137 |
for i, product_name in enumerate(products):
|
@@ -139,7 +141,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
|
|
139 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
140 |
else:
|
141 |
# Standard embedding creation with just product names
|
142 |
-
product_embeddings = create_product_embeddings(products
|
143 |
|
144 |
similarities = compute_similarities(ingredients_dict, product_embeddings)
|
145 |
|
@@ -148,7 +150,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
|
|
148 |
for product, product_similarities in similarities.items():
|
149 |
embedding_results[product] = product_similarities[:embedding_top_n]
|
150 |
|
151 |
-
progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI")
|
152 |
|
153 |
# Initialize Voyage client
|
154 |
voyage_client = get_voyage_client()
|
@@ -157,7 +159,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
|
|
157 |
final_results = {}
|
158 |
|
159 |
for i, product in enumerate(products):
|
160 |
-
progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}")
|
161 |
|
162 |
if product not in embedding_results or not embedding_results[product]:
|
163 |
final_results[product] = []
|
@@ -197,7 +199,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
|
|
197 |
# Fall back to embedding results
|
198 |
final_results[product] = candidates[:1]
|
199 |
|
200 |
-
progress_tracker(1.0, desc="Voyage ingredient matching complete")
|
201 |
return final_results
|
202 |
|
203 |
# Add this function to ui_hybrid_matching.py
|
@@ -206,13 +208,14 @@ def hybrid_category_matching_voyage(products, categories_dict,
|
|
206 |
embedding_top_n=20, final_top_n=5,
|
207 |
confidence_threshold=0.5,
|
208 |
expanded_descriptions=None,
|
209 |
-
|
210 |
"""Use Voyage AI for reranking categories instead of OpenAI"""
|
211 |
-
from utils import SafeProgress
|
212 |
from embeddings import create_product_embeddings
|
213 |
|
214 |
-
|
215 |
-
progress_tracker(
|
|
|
216 |
|
217 |
# Stage 1: Same as before - use embeddings to find candidates
|
218 |
if expanded_descriptions:
|
@@ -220,7 +223,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
|
|
220 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
|
221 |
# Map expanded descriptions back to original product names for consistent keys
|
222 |
product_embeddings = {}
|
223 |
-
temp_embeddings = create_product_embeddings(products_for_embedding,
|
224 |
|
225 |
# Ensure we use original product names as keys
|
226 |
for i, product_name in enumerate(products):
|
@@ -228,7 +231,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
|
|
228 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
229 |
else:
|
230 |
# Standard embedding creation with just product names
|
231 |
-
product_embeddings = create_product_embeddings(products
|
232 |
|
233 |
from similarity import compute_similarities
|
234 |
similarities = compute_similarities(categories_dict, product_embeddings)
|
@@ -238,7 +241,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
|
|
238 |
for product, product_similarities in similarities.items():
|
239 |
embedding_results[product] = product_similarities[:embedding_top_n]
|
240 |
|
241 |
-
progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI")
|
242 |
|
243 |
# Initialize Voyage client
|
244 |
voyage_client = get_voyage_client()
|
@@ -246,7 +249,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
|
|
246 |
# Stage 2: Re-rank using Voyage AI
|
247 |
final_results = {}
|
248 |
for i, product in enumerate(products):
|
249 |
-
progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}")
|
250 |
|
251 |
if product not in embedding_results or not embedding_results[product]:
|
252 |
final_results[product] = []
|
@@ -286,5 +289,5 @@ def hybrid_category_matching_voyage(products, categories_dict,
|
|
286 |
# Fall back to embedding results
|
287 |
final_results[product] = candidates[:1]
|
288 |
|
289 |
-
progress_tracker(1.0, desc="Voyage category matching complete")
|
290 |
return final_results
|
|
|
1 |
+
# import gradio as gr # Removed Gradio import
|
2 |
+
# from utils import SafeProgress # Removed SafeProgress import
|
3 |
from category_matching import load_categories, hybrid_category_matching
|
4 |
from similarity import hybrid_ingredient_matching, compute_similarities
|
5 |
from ui_core import embeddings, parse_input
|
|
|
9 |
|
10 |
def categorize_products_with_voyage_reranking(product_input, is_file=False, use_expansion=False,
|
11 |
embedding_top_n=20, final_top_n=5, confidence_threshold=0.5,
|
12 |
+
match_type="categories"): # Removed progress parameter
|
13 |
"""
|
14 |
Categorize products using Voyage reranking with optional description expansion
|
15 |
"""
|
16 |
+
# Removed Gradio progress tracking
|
17 |
+
# progress_tracker = SafeProgress(progress)
|
18 |
+
# progress_tracker(0, desc=f"Starting Voyage reranking for {match_type}...")
|
19 |
|
20 |
# Parse input
|
21 |
product_names, error = parse_input(product_input, is_file)
|
|
|
25 |
# Optional description expansion
|
26 |
expanded_descriptions = {}
|
27 |
if use_expansion:
|
28 |
+
# progress_tracker(0.3, desc="Expanding product descriptions...") # Removed progress
|
29 |
+
expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
|
30 |
|
31 |
match_results = {}
|
32 |
if match_type == "categories":
|
33 |
# Load categories
|
34 |
+
# progress_tracker(0.2, desc="Loading categories...") # Removed progress
|
35 |
categories = load_categories()
|
36 |
|
37 |
# Use hybrid approach for categories with optional expanded descriptions
|
38 |
+
# progress_tracker(0.5, desc="Finding and re-ranking categories...") # Removed progress
|
39 |
match_results = hybrid_category_matching(
|
40 |
product_names, categories,
|
41 |
embedding_top_n=int(embedding_top_n),
|
42 |
+
final_top_n=int(final_top_n),
|
43 |
confidence_threshold=0.0, # Don't apply threshold here - do it in display
|
44 |
+
expanded_descriptions=expanded_descriptions if use_expansion else None
|
45 |
+
# Removed progress argument
|
46 |
)
|
47 |
else: # ingredients
|
48 |
# Validate embeddings are loaded
|
|
|
50 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No ingredient embeddings loaded. Please check that the embeddings file exists and is properly formatted.</div>"
|
51 |
|
52 |
# Use hybrid approach for ingredients with optional expanded descriptions
|
53 |
+
# progress_tracker(0.5, desc="Finding and re-ranking ingredients...") # Removed progress
|
54 |
match_results = hybrid_ingredient_matching(
|
55 |
product_names, embeddings,
|
56 |
embedding_top_n=int(embedding_top_n),
|
57 |
+
final_top_n=int(final_top_n),
|
58 |
confidence_threshold=0.0, # Don't apply threshold here - do it in display
|
59 |
+
expanded_descriptions=expanded_descriptions if use_expansion else None
|
60 |
+
# Removed progress argument
|
61 |
)
|
62 |
|
63 |
# Format results
|
64 |
+
# progress_tracker(0.9, desc="Formatting results...") # Removed progress
|
65 |
|
66 |
# Convert to unified format for formatter
|
67 |
formatted_results = []
|
|
|
110 |
confidence_threshold=confidence_threshold # Pass the threshold to the formatter
|
111 |
)
|
112 |
|
113 |
+
# progress_tracker(1.0, desc="Done!") # Removed progress
|
114 |
return result_html
|
115 |
|
116 |
# Update the function in ui_hybrid_matching.py
|
|
|
118 |
embedding_top_n=20, final_top_n=5,
|
119 |
confidence_threshold=0.5,
|
120 |
expanded_descriptions=None,
|
121 |
+
): # Removed progress parameter
|
122 |
"""Use Voyage AI for reranking instead of OpenAI"""
|
123 |
+
# from utils import SafeProgress # Removed SafeProgress import
|
124 |
from embeddings import create_product_embeddings
|
125 |
|
126 |
+
# Removed Gradio progress tracking
|
127 |
+
# progress_tracker = SafeProgress(progress, desc="Voyage ingredient matching")
|
128 |
+
# progress_tracker(0.1, desc="Stage 1: Finding candidates with embeddings")
|
129 |
|
130 |
# Stage 1: Same as before - use embeddings to find candidates
|
131 |
if expanded_descriptions:
|
|
|
133 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
|
134 |
# Map expanded descriptions back to original product names for consistent keys
|
135 |
product_embeddings = {}
|
136 |
+
temp_embeddings = create_product_embeddings(products_for_embedding, original_products=products) # Removed progress, pass original names
|
137 |
|
138 |
# Ensure we use original product names as keys
|
139 |
for i, product_name in enumerate(products):
|
|
|
141 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
142 |
else:
|
143 |
# Standard embedding creation with just product names
|
144 |
+
product_embeddings = create_product_embeddings(products) # Removed progress
|
145 |
|
146 |
similarities = compute_similarities(ingredients_dict, product_embeddings)
|
147 |
|
|
|
150 |
for product, product_similarities in similarities.items():
|
151 |
embedding_results[product] = product_similarities[:embedding_top_n]
|
152 |
|
153 |
+
# progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI") # Removed progress
|
154 |
|
155 |
# Initialize Voyage client
|
156 |
voyage_client = get_voyage_client()
|
|
|
159 |
final_results = {}
|
160 |
|
161 |
for i, product in enumerate(products):
|
162 |
+
# progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}") # Removed progress
|
163 |
|
164 |
if product not in embedding_results or not embedding_results[product]:
|
165 |
final_results[product] = []
|
|
|
199 |
# Fall back to embedding results
|
200 |
final_results[product] = candidates[:1]
|
201 |
|
202 |
+
# progress_tracker(1.0, desc="Voyage ingredient matching complete") # Removed progress
|
203 |
return final_results
|
204 |
|
205 |
# Add this function to ui_hybrid_matching.py
|
|
|
208 |
embedding_top_n=20, final_top_n=5,
|
209 |
confidence_threshold=0.5,
|
210 |
expanded_descriptions=None,
|
211 |
+
): # Removed progress parameter
|
212 |
"""Use Voyage AI for reranking categories instead of OpenAI"""
|
213 |
+
# from utils import SafeProgress # Removed SafeProgress import
|
214 |
from embeddings import create_product_embeddings
|
215 |
|
216 |
+
# Removed Gradio progress tracking
|
217 |
+
# progress_tracker = SafeProgress(progress, desc="Voyage category matching")
|
218 |
+
# progress_tracker(0.1, desc="Stage 1: Finding candidate categories with embeddings")
|
219 |
|
220 |
# Stage 1: Same as before - use embeddings to find candidates
|
221 |
if expanded_descriptions:
|
|
|
223 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
|
224 |
# Map expanded descriptions back to original product names for consistent keys
|
225 |
product_embeddings = {}
|
226 |
+
temp_embeddings = create_product_embeddings(products_for_embedding, original_products=products) # Removed progress, pass original names
|
227 |
|
228 |
# Ensure we use original product names as keys
|
229 |
for i, product_name in enumerate(products):
|
|
|
231 |
product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
232 |
else:
|
233 |
# Standard embedding creation with just product names
|
234 |
+
product_embeddings = create_product_embeddings(products) # Removed progress
|
235 |
|
236 |
from similarity import compute_similarities
|
237 |
similarities = compute_similarities(categories_dict, product_embeddings)
|
|
|
241 |
for product, product_similarities in similarities.items():
|
242 |
embedding_results[product] = product_similarities[:embedding_top_n]
|
243 |
|
244 |
+
# progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI") # Removed progress
|
245 |
|
246 |
# Initialize Voyage client
|
247 |
voyage_client = get_voyage_client()
|
|
|
249 |
# Stage 2: Re-rank using Voyage AI
|
250 |
final_results = {}
|
251 |
for i, product in enumerate(products):
|
252 |
+
# progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}") # Removed progress
|
253 |
|
254 |
if product not in embedding_results or not embedding_results[product]:
|
255 |
final_results[product] = []
|
|
|
289 |
# Fall back to embedding results
|
290 |
final_results[product] = candidates[:1]
|
291 |
|
292 |
+
# progress_tracker(1.0, desc="Voyage category matching complete") # Removed progress
|
293 |
return final_results
|
ui_ingredient_matching.py
CHANGED
@@ -25,7 +25,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
|
|
25 |
expanded_descriptions = {}
|
26 |
if use_expansion:
|
27 |
progress_tracker(0.2, desc="Expanding product descriptions...")
|
28 |
-
expanded_descriptions = expand_product_descriptions(product_names
|
29 |
|
30 |
# Create embeddings
|
31 |
progress_tracker(0.4, desc="Generating product embeddings...")
|
@@ -34,7 +34,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
|
|
34 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
|
35 |
# Map expanded descriptions back to original product names for consistent keys
|
36 |
products_embeddings = {}
|
37 |
-
temp_embeddings = create_product_embeddings(products_for_embedding,
|
38 |
|
39 |
# Ensure we use original product names as keys
|
40 |
for i, product_name in enumerate(product_names):
|
@@ -42,7 +42,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
|
|
42 |
products_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
43 |
else:
|
44 |
# Standard embedding creation with just product names
|
45 |
-
products_embeddings = create_product_embeddings(product_names
|
46 |
|
47 |
if not products_embeddings:
|
48 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: Failed to generate product embeddings. Please try again with different product names.</div>"
|
|
|
25 |
expanded_descriptions = {}
|
26 |
if use_expansion:
|
27 |
progress_tracker(0.2, desc="Expanding product descriptions...")
|
28 |
+
expanded_descriptions = expand_product_descriptions(product_names) # Removed progress
|
29 |
|
30 |
# Create embeddings
|
31 |
progress_tracker(0.4, desc="Generating product embeddings...")
|
|
|
34 |
products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
|
35 |
# Map expanded descriptions back to original product names for consistent keys
|
36 |
products_embeddings = {}
|
37 |
+
temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names for keys
|
38 |
|
39 |
# Ensure we use original product names as keys
|
40 |
for i, product_name in enumerate(product_names):
|
|
|
42 |
products_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
|
43 |
else:
|
44 |
# Standard embedding creation with just product names
|
45 |
+
products_embeddings = create_product_embeddings(product_names) # Removed progress
|
46 |
|
47 |
if not products_embeddings:
|
48 |
return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: Failed to generate product embeddings. Please try again with different product names.</div>"
|