esilver commited on
Commit
164730f
·
1 Parent(s): e314c06

Use streamlit

Browse files
Files changed (10) hide show
  1. README.md +66 -31
  2. app.py +43 -22
  3. comparison.py +40 -37
  4. embeddings.py +3 -2
  5. requirements.txt +3 -1
  6. ui.py +244 -208
  7. ui_category_matching.py +13 -12
  8. ui_expanded_matching.py +27 -32
  9. ui_hybrid_matching.py +39 -36
  10. ui_ingredient_matching.py +3 -3
README.md CHANGED
@@ -1,52 +1,87 @@
1
  ---
2
  license: mit
3
- title: Demo
4
- sdk: gradio
5
  emoji: 🚀
6
  colorFrom: purple
7
  colorTo: yellow
8
- sdk_version: 5.22.0
9
  ---
10
- # Product Categorization App - One-Click Solution
11
 
12
- This is a turnkey solution for categorizing products based on their similarity to ingredients using Voyage AI.
13
 
14
  ## Quick Start
15
 
16
- 1. Place your `ingredient_embeddings_voyageai.pkl` file in the same folder as this README
17
- 2. Run the application:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- ```bash
20
- bash run_app.sh
21
- ```
22
 
23
- 3. That's it! A browser window will open with the app, and a public URL will be created for sharing
24
-
25
- ## What You Can Do
26
-
27
- - **Text Input:** Enter product names one per line
28
- - **File Upload:** Upload a JSON file with product data
29
- - Adjust the number of categories and Similarity Threshold
30
- - View the categorization results with confidence scores
 
 
31
 
32
  ## Hosting on Hugging Face Spaces
33
 
34
- For permanent, free hosting on Gradio:
35
-
36
- 1. Create a free account on [Hugging Face](https://huggingface.co/)
37
- 2. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
38
- 3. Click "Create a Space"
39
- 4. Select "Gradio" as the SDK
40
- 5. Upload all files (including your embeddings file) to the space
41
- 6. Your app will be automatically deployed!
42
 
43
  ## Files Included
44
 
45
- - `app.py`: The main application code
46
- - `requirements.txt`: Required Python packages
47
- - `run_app.sh`: One-click deployment script
 
 
 
 
48
 
49
  ## Requirements
50
 
51
- - Python 3.7+
52
- - Internet connection (for Voyage AI API)
 
 
1
  ---
2
  license: mit
3
+ title: Product Categorization Demo
4
+ sdk: streamlit
5
  emoji: 🚀
6
  colorFrom: purple
7
  colorTo: yellow
8
+ # sdk_version: (Streamlit doesn't typically use a fixed version here)
9
  ---
10
+ # Product Categorization App - Streamlit Demo
11
 
12
+ This is a Streamlit application for categorizing products based on their similarity to ingredients or predefined categories using AI embeddings (e.g., Voyage AI) and optional reranking (Voyage AI, OpenAI).
13
 
14
  ## Quick Start
15
 
16
+ 1. **Clone the repository:**
17
+ ```bash
18
+ git clone <repository_url>
19
+ cd <repository_directory>
20
+ ```
21
+ 2. **Create a virtual environment (optional but recommended):**
22
+ ```bash
23
+ python -m venv venv
24
+ source venv/bin/activate # On Windows use `venv\Scripts\activate`
25
+ ```
26
+ 3. **Install dependencies:**
27
+ ```bash
28
+ pip install -r requirements.txt
29
+ ```
30
+ 4. **Prepare Embeddings:** Ensure your embedding files (`ingredient_embeddings_voyageai.pkl`, `category_embeddings.pickle`, etc.) are present in the `data/` directory.
31
+ 5. **Configure API Keys:**
32
+ * Copy the `.env.example` file (if it exists) or create a new file named `.env`.
33
+ * Add your API keys to the `.env` file:
34
+ ```dotenv
35
+ VOYAGE_API_KEY="YOUR_VOYAGE_API_KEY_HERE"
36
+ OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
37
+ # Add other keys like CHICORY if needed
38
+ ```
39
+ 6. **Run the application:**
40
+ ```bash
41
+ streamlit run app.py
42
+ ```
43
+ Alternatively, if you have configured the `./run_app.sh` script:
44
+ ```bash
45
+ ./run_app.sh
46
+ ```
47
+ 7. The application will open in your default web browser.
48
 
49
+ ## Features
 
 
50
 
51
+ - **Multiple Matching Methods:**
52
+ - Ingredient Embeddings
53
+ - Category Embeddings
54
+ - Voyage AI Reranking (Ingredients/Categories)
55
+ - OpenAI Reranking (Ingredients/Categories)
56
+ - Comparison View across methods
57
+ - **Text Input:** Enter product names one per line.
58
+ - **Description Expansion:** Optionally use OpenAI to expand product descriptions before matching.
59
+ - **Adjustable Parameters:** Control Top-N results, confidence thresholds, etc. for different methods.
60
+ - **Example Loading:** Quickly load sample product names.
61
 
62
  ## Hosting on Hugging Face Spaces
63
 
64
+ 1. Create a free account on [Hugging Face](https://huggingface.co/).
65
+ 2. Go to [Hugging Face Spaces](https://huggingface.co/spaces).
66
+ 3. Click "Create a new Space".
67
+ 4. Select "Streamlit" as the SDK.
68
+ 5. Choose a repository type (usually Git).
69
+ 6. Upload all project files (including the `data` directory with embeddings) to the space repository.
70
+ 7. **Important:** Add your API keys (`VOYAGE_API_KEY`, `OPENAI_API_KEY`, etc.) as **Secrets** in your Hugging Face Space settings. Do *not* commit the `.env` file directly.
71
+ 8. Your app should build and deploy automatically.
72
 
73
  ## Files Included
74
 
75
+ - `app.py`: The main Streamlit application entry point.
76
+ - `ui.py`: Defines the Streamlit UI layout and components.
77
+ - `*.py` (various): Backend logic for embeddings, matching, API calls, formatting.
78
+ - `requirements.txt`: Required Python packages.
79
+ - `.env`: File to store API keys (add your keys here, **do not commit**).
80
+ - `run_app.sh`: Example script to run the app locally.
81
+ - `data/`: Directory containing embedding files.
82
 
83
  ## Requirements
84
 
85
+ - Python 3.8+
86
+ - API keys for Voyage AI and/or OpenAI (stored in `.env`).
87
+ - Internet connection for API calls.
app.py CHANGED
@@ -1,29 +1,50 @@
1
  import os
2
  import sys
3
- import gradio as gr
 
 
 
 
4
  from utils import load_embeddings
5
- from ui import categorize_products, create_demo # Updated imports
 
 
 
 
6
 
7
  # Path to the embeddings file
8
  EMBEDDINGS_PATH = "data/ingredient_embeddings_voyageai.pkl"
9
 
10
- # Check if embeddings file exists
11
- if not os.path.exists(EMBEDDINGS_PATH):
12
- print(f"Error: Embeddings file {EMBEDDINGS_PATH} not found!")
13
- print(f"Please ensure the file exists at {os.path.abspath(EMBEDDINGS_PATH)}")
14
- sys.exit(1)
15
-
16
- # Load embeddings globally
17
- try:
18
- embeddings_data = load_embeddings(EMBEDDINGS_PATH)
19
- # Make embeddings available to the UI functions
20
- import ui
21
- ui.embeddings = embeddings_data
22
- except Exception as e:
23
- print(f"Error loading embeddings: {e}")
24
- sys.exit(1)
25
-
26
- # Launch the Gradio interface
27
- if __name__ == "__main__":
28
- demo = create_demo()
29
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import os
2
  import sys
3
+ from dotenv import load_dotenv # Import load_dotenv
4
+ import streamlit as st
5
+
6
+ # Load environment variables from .env file at the very beginning
7
+ load_dotenv()
8
  from utils import load_embeddings
9
+ from ui import render_ui # Import the new Streamlit UI function
10
+
11
+ # Set page config as the first Streamlit command
12
+ st.set_page_config(layout="wide", page_title="Product Categorization Tool")
13
+ import ui_core # Import ui_core to set embeddings
14
 
15
  # Path to the embeddings file
16
  EMBEDDINGS_PATH = "data/ingredient_embeddings_voyageai.pkl"
17
 
18
+ # Use Streamlit's caching to load embeddings only once
19
+ @st.cache_data
20
+ def load_all_embeddings(path):
21
+ """Loads embeddings from the specified path."""
22
+ if not os.path.exists(path):
23
+ st.error(f"Error: Embeddings file {path} not found!")
24
+ st.error(f"Please ensure the file exists at {os.path.abspath(path)}")
25
+ st.stop() # Stop execution if file not found
26
+ return None # Return None explicitly, although st.stop() halts
27
+
28
+ try:
29
+ embeddings_data = load_embeddings(path)
30
+ return embeddings_data
31
+ except Exception as e:
32
+ st.error(f"Error loading embeddings: {e}")
33
+ st.stop()
34
+ return None
35
+
36
+ # Load embeddings and make them available to UI modules
37
+ embeddings_data = load_all_embeddings(EMBEDDINGS_PATH)
38
+
39
+ if embeddings_data:
40
+ # Pass the loaded embeddings to the ui_core module where other UI modules import it from
41
+ ui_core.embeddings = embeddings_data
42
+
43
+ # Render the Streamlit UI
44
+ render_ui()
45
+ else:
46
+ # This part should ideally not be reached due to st.stop() in load_all_embeddings
47
+ st.error("Failed to load embeddings. Application cannot start.")
48
+
49
+ # Note: No __main__ block needed for Streamlit.
50
+ # Streamlit apps are run using `streamlit run app.py`
comparison.py CHANGED
@@ -8,7 +8,7 @@ from similarity import hybrid_ingredient_matching
8
  from api_utils import process_in_parallel, rank_ingredients_openai
9
  from ui_formatters import format_comparison_html, create_results_container
10
 
11
- from utils import SafeProgress
12
  from chicory_api import call_chicory_parser
13
  from embeddings import create_product_embeddings
14
  from similarity import compute_similarities
@@ -16,7 +16,7 @@ from similarity import compute_similarities
16
  def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str, Any],
17
  embedding_top_n: int = 20, final_top_n: int = 3,
18
  confidence_threshold: float = 0.5, match_type="ingredients",
19
- progress=None, expanded_descriptions=None) -> Dict[str, Dict[str, List[Tuple]]]:
20
  """
21
  Compare multiple ingredient/category matching methods on the same products
22
 
@@ -43,20 +43,21 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
43
  else:
44
  print(f"WARNING: First product '{products[0] if products else 'None'}' not found in expanded descriptions")
45
 
46
- progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
 
47
 
48
  # Step 1: Generate embeddings for all products (used by multiple methods)
49
- progress_tracker(0.1, desc="Generating product embeddings")
50
  # Use expanded descriptions for embeddings if available
51
  if expanded_descriptions:
52
  expanded_product_texts = [expanded_descriptions.get(p, p) for p in products]
53
- product_embeddings = create_product_embeddings(expanded_product_texts, progress=progress_tracker,
54
- original_products=products) # Keep original product IDs
55
  else:
56
- product_embeddings = create_product_embeddings(products, progress=progress_tracker)
57
 
58
  # Step 2: Get embedding-based candidates for all products
59
- progress_tracker(0.2, desc="Finding embedding candidates")
60
  similarities = compute_similarities(ingredients_dict, product_embeddings)
61
 
62
  # Filter to top N candidates per product
@@ -65,11 +66,11 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
65
  embedding_results[product] = product_similarities[:embedding_top_n]
66
 
67
  # Step 3: Process with Chicory Parser
68
- progress_tracker(0.3, desc="Running Chicory Parser")
69
  # Import here to avoid circular imports
70
  # from chicory_parser import parse_products
71
 
72
- chicory_results = call_chicory_parser(products, progress=progress_tracker)
73
 
74
  # Initialize result structure
75
  comparison_results = {}
@@ -103,7 +104,7 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
103
  comparison_results[product]["chicory"] = chicory_matches
104
 
105
  # Step 4: Process with Voyage AI
106
- progress_tracker(0.4, desc="Processing with Voyage AI")
107
 
108
  # Define processing function for Voyage
109
  def process_voyage(product):
@@ -156,13 +157,17 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
156
 
157
  # Ensure results are in the expected format
158
  formatted_results = []
 
159
  for r in results[:final_top_n]:
160
  if isinstance(r, dict) and "name" in r and "score" in r:
161
  # Convert score to float to ensure type compatibility
162
  try:
163
  score = float(r["score"])
 
164
  if score >= confidence_threshold:
165
- formatted_results.append((r["name"], score))
 
 
166
  except (ValueError, TypeError):
167
  print(f"Invalid score format in result: {r}")
168
  elif isinstance(r, tuple) and len(r) >= 2:
@@ -177,7 +182,9 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
177
  name = r[0]
178
 
179
  if score >= confidence_threshold:
180
- formatted_results.append((name, score))
 
 
181
  except (ValueError, TypeError):
182
  print(f"Invalid score format in tuple: {r}")
183
 
@@ -197,11 +204,8 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
197
  voyage_results = process_in_parallel(
198
  items=products,
199
  processor_func=process_voyage,
200
- max_workers=min(20, len(products)),
201
- progress_tracker=progress_tracker,
202
- progress_start=0.4,
203
- progress_end=0.65,
204
- progress_desc="Voyage AI"
205
  )
206
 
207
  # Update comparison results with Voyage results
@@ -210,7 +214,7 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
210
  comparison_results[product]["voyage"] = results
211
 
212
  # Step 5: Process with OpenAI
213
- progress_tracker(0.7, desc="Running OpenAI processing in parallel")
214
 
215
  # Define processing function for OpenAI
216
  def process_openai(product):
@@ -261,11 +265,8 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
261
  openai_results = process_in_parallel(
262
  items=products,
263
  processor_func=process_openai,
264
- max_workers=min(20, len(products)),
265
- progress_tracker=progress_tracker,
266
- progress_start=0.7,
267
- progress_end=0.95,
268
- progress_desc="OpenAI"
269
  )
270
 
271
  # Update comparison results with OpenAI results
@@ -303,12 +304,12 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
303
 
304
  method_results[method] = formatted_results
305
 
306
- progress_tracker(1.0, desc="Comparison complete")
307
  return comparison_results
308
 
309
  def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
310
  final_top_n=3, confidence_threshold=0.5,
311
- match_type="categories", use_expansion=False, progress=None):
312
  """
313
  Compare multiple ingredient matching methods on the same products
314
 
@@ -324,10 +325,12 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
324
  Returns:
325
  HTML formatted comparison results
326
  """
327
- from utils import SafeProgress, load_embeddings
 
328
 
329
- progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
330
- progress_tracker(0.1, desc="Processing input")
 
331
 
332
  # Split text input by lines and remove empty lines
333
  if not product_input:
@@ -338,7 +341,7 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
338
 
339
  # Load appropriate embeddings based on match type
340
  try:
341
- progress_tracker(0.2, desc="Loading embeddings")
342
  if match_type == "ingredients":
343
  embeddings_path = "data/ingredient_embeddings_voyageai.pkl"
344
  embeddings_dict = load_embeddings(embeddings_path)
@@ -355,20 +358,20 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
355
  # Expand descriptions if requested
356
  if use_expansion:
357
  from openai_expansion import expand_product_descriptions
358
- progress_tracker(0.25, desc="Expanding product descriptions")
359
- expanded_products = expand_product_descriptions(product_names, progress=progress_tracker)
360
  # Add at beginning of results
361
  header_text = f"Comparing {len(product_names)} products using multiple {match_type} matching methods WITH expanded descriptions."
362
 
363
- progress_tracker(0.3, desc="Comparing methods")
364
  comparison_results = compare_ingredient_methods(
365
  products=product_names,
366
  ingredients_dict=embeddings_dict,
367
  embedding_top_n=embedding_top_n,
368
  final_top_n=final_top_n,
369
  confidence_threshold=confidence_threshold,
370
- match_type=match_type,
371
- progress=progress_tracker,
372
  expanded_descriptions=expanded_products
373
  )
374
  except Exception as e:
@@ -377,7 +380,7 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
377
  return f"<div style='color: red;'>Error comparing methods: {str(e)}<br><pre>{error_details}</pre></div>"
378
 
379
  # Format results as HTML using centralized formatters
380
- progress_tracker(0.9, desc="Formatting results")
381
  result_elements = []
382
  for product in product_names:
383
  if product in comparison_results:
@@ -393,5 +396,5 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
393
  header_text=header_text
394
  )
395
 
396
- progress_tracker(1.0, desc="Complete")
397
  return output_html
 
8
  from api_utils import process_in_parallel, rank_ingredients_openai
9
  from ui_formatters import format_comparison_html, create_results_container
10
 
11
+ # from utils import SafeProgress # Removed SafeProgress import
12
  from chicory_api import call_chicory_parser
13
  from embeddings import create_product_embeddings
14
  from similarity import compute_similarities
 
16
  def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str, Any],
17
  embedding_top_n: int = 20, final_top_n: int = 3,
18
  confidence_threshold: float = 0.5, match_type="ingredients",
19
+ expanded_descriptions=None) -> Dict[str, Dict[str, List[Tuple]]]: # Removed progress parameter
20
  """
21
  Compare multiple ingredient/category matching methods on the same products
22
 
 
43
  else:
44
  print(f"WARNING: First product '{products[0] if products else 'None'}' not found in expanded descriptions")
45
 
46
+ # Removed Gradio progress tracking
47
+ # progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
48
 
49
  # Step 1: Generate embeddings for all products (used by multiple methods)
50
+ # progress_tracker(0.1, desc="Generating product embeddings") # Removed progress
51
  # Use expanded descriptions for embeddings if available
52
  if expanded_descriptions:
53
  expanded_product_texts = [expanded_descriptions.get(p, p) for p in products]
54
+ product_embeddings = create_product_embeddings(expanded_product_texts,
55
+ original_products=products) # Keep original product IDs, removed progress
56
  else:
57
+ product_embeddings = create_product_embeddings(products) # Removed progress
58
 
59
  # Step 2: Get embedding-based candidates for all products
60
+ # progress_tracker(0.2, desc="Finding embedding candidates") # Removed progress
61
  similarities = compute_similarities(ingredients_dict, product_embeddings)
62
 
63
  # Filter to top N candidates per product
 
66
  embedding_results[product] = product_similarities[:embedding_top_n]
67
 
68
  # Step 3: Process with Chicory Parser
69
+ # progress_tracker(0.3, desc="Running Chicory Parser") # Removed progress
70
  # Import here to avoid circular imports
71
  # from chicory_parser import parse_products
72
 
73
+ chicory_results = call_chicory_parser(products) # Removed progress
74
 
75
  # Initialize result structure
76
  comparison_results = {}
 
104
  comparison_results[product]["chicory"] = chicory_matches
105
 
106
  # Step 4: Process with Voyage AI
107
+ # progress_tracker(0.4, desc="Processing with Voyage AI") # Removed progress
108
 
109
  # Define processing function for Voyage
110
  def process_voyage(product):
 
157
 
158
  # Ensure results are in the expected format
159
  formatted_results = []
160
+ added_ids = set() # Keep track of added category IDs to avoid duplicates
161
  for r in results[:final_top_n]:
162
  if isinstance(r, dict) and "name" in r and "score" in r:
163
  # Convert score to float to ensure type compatibility
164
  try:
165
  score = float(r["score"])
166
+ name = r["name"] # Extract name for check
167
  if score >= confidence_threshold:
168
+ if name not in added_ids: # Check for duplicates
169
+ formatted_results.append((name, score))
170
+ added_ids.add(name) # Add ID to set
171
  except (ValueError, TypeError):
172
  print(f"Invalid score format in result: {r}")
173
  elif isinstance(r, tuple) and len(r) >= 2:
 
182
  name = r[0]
183
 
184
  if score >= confidence_threshold:
185
+ if name not in added_ids: # Check for duplicates
186
+ formatted_results.append((name, score))
187
+ added_ids.add(name) # Add ID to set
188
  except (ValueError, TypeError):
189
  print(f"Invalid score format in tuple: {r}")
190
 
 
204
  voyage_results = process_in_parallel(
205
  items=products,
206
  processor_func=process_voyage,
207
+ max_workers=min(20, len(products))
208
+ # Removed ALL progress tracking arguments
 
 
 
209
  )
210
 
211
  # Update comparison results with Voyage results
 
214
  comparison_results[product]["voyage"] = results
215
 
216
  # Step 5: Process with OpenAI
217
+ # progress_tracker(0.7, desc="Running OpenAI processing in parallel") # Removed progress
218
 
219
  # Define processing function for OpenAI
220
  def process_openai(product):
 
265
  openai_results = process_in_parallel(
266
  items=products,
267
  processor_func=process_openai,
268
+ max_workers=min(20, len(products))
269
+ # Removed ALL progress tracking arguments
 
 
 
270
  )
271
 
272
  # Update comparison results with OpenAI results
 
304
 
305
  method_results[method] = formatted_results
306
 
307
+ # progress_tracker(1.0, desc="Comparison complete") # Removed progress
308
  return comparison_results
309
 
310
  def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
311
  final_top_n=3, confidence_threshold=0.5,
312
+ match_type="categories", use_expansion=False): # Removed progress parameter
313
  """
314
  Compare multiple ingredient matching methods on the same products
315
 
 
325
  Returns:
326
  HTML formatted comparison results
327
  """
328
+ # from utils import SafeProgress # Removed SafeProgress import
329
+ from utils import load_embeddings
330
 
331
+ # Removed Gradio progress tracking
332
+ # progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
333
+ # progress_tracker(0.1, desc="Processing input")
334
 
335
  # Split text input by lines and remove empty lines
336
  if not product_input:
 
341
 
342
  # Load appropriate embeddings based on match type
343
  try:
344
+ # progress_tracker(0.2, desc="Loading embeddings") # Removed progress
345
  if match_type == "ingredients":
346
  embeddings_path = "data/ingredient_embeddings_voyageai.pkl"
347
  embeddings_dict = load_embeddings(embeddings_path)
 
358
  # Expand descriptions if requested
359
  if use_expansion:
360
  from openai_expansion import expand_product_descriptions
361
+ # progress_tracker(0.25, desc="Expanding product descriptions") # Removed progress
362
+ expanded_products = expand_product_descriptions(product_names) # Removed progress argument
363
  # Add at beginning of results
364
  header_text = f"Comparing {len(product_names)} products using multiple {match_type} matching methods WITH expanded descriptions."
365
 
366
+ # progress_tracker(0.3, desc="Comparing methods") # Removed progress
367
  comparison_results = compare_ingredient_methods(
368
  products=product_names,
369
  ingredients_dict=embeddings_dict,
370
  embedding_top_n=embedding_top_n,
371
  final_top_n=final_top_n,
372
  confidence_threshold=confidence_threshold,
373
+ match_type=match_type, # Added missing comma
374
+ # Removed progress argument
375
  expanded_descriptions=expanded_products
376
  )
377
  except Exception as e:
 
380
  return f"<div style='color: red;'>Error comparing methods: {str(e)}<br><pre>{error_details}</pre></div>"
381
 
382
  # Format results as HTML using centralized formatters
383
+ # progress_tracker(0.9, desc="Formatting results") # Removed progress
384
  result_elements = []
385
  for product in product_names:
386
  if product in comparison_results:
 
396
  header_text=header_text
397
  )
398
 
399
+ # progress_tracker(1.0, desc="Complete") # Removed progress
400
  return output_html
embeddings.py CHANGED
@@ -6,8 +6,9 @@ import time
6
  import numpy as np
7
  from concurrent.futures import ThreadPoolExecutor
8
 
9
- # Set Voyage AI API key directly
10
- voyageai.api_key = os.getenv("VOYAGE_API_KEY")
 
11
 
12
  def get_embeddings_batch(texts, model="voyage-3-large", batch_size=100):
13
  """Get embeddings for a list of texts in batches"""
 
6
  import numpy as np
7
  from concurrent.futures import ThreadPoolExecutor
8
 
9
+ # Voyage AI API key is now loaded via environment variable
10
+ # when voyageai.Client() is initialized (after load_dotenv runs)
11
+ # voyageai.api_key = os.getenv("VOYAGE_API_KEY") # Removed global setting
12
 
13
  def get_embeddings_batch(texts, model="voyage-3-large", batch_size=100):
14
  """Get embeddings for a list of texts in batches"""
requirements.txt CHANGED
@@ -1,6 +1,8 @@
1
  voyageai
2
  numpy
3
- gradio
 
4
  openai
5
  requests
6
  tqdm
 
 
1
  voyageai
2
  numpy
3
+ streamlit
4
+ pandas
5
  openai
6
  requests
7
  tqdm
8
+ python-dotenv
ui.py CHANGED
@@ -1,222 +1,258 @@
1
- import gradio as gr
 
2
  from comparison import compare_ingredient_methods_ui
3
-
4
- # Import from our UI modules
5
- from ui_core import embeddings, get_css, load_examples
6
  from ui_ingredient_matching import categorize_products
7
  from ui_category_matching import categorize_products_by_category
8
  from ui_hybrid_matching import categorize_products_with_voyage_reranking
9
  from ui_expanded_matching import categorize_products_with_openai_reranking
 
10
 
11
- def create_demo():
12
- """Create the Gradio interface"""
13
- with gr.Blocks(css=get_css()) as demo:
14
- gr.Markdown("# Product Categorization Tool\nAnalyze products by matching to ingredients or categories using AI embeddings.")
15
-
16
- with gr.Tabs() as tabs:
17
- # Original Ingredient Matching Tab
18
- with gr.TabItem("Ingredient Embeddings"):
19
- with gr.Row():
20
- with gr.Column(scale=1):
21
- # Input section
22
- text_input = gr.Textbox(
23
- lines=10,
24
- placeholder="Enter product names, one per line",
25
- label="Product Names"
26
- )
27
- input_controls = gr.Row()
28
- with input_controls:
29
- use_expansion = gr.Checkbox(
30
- value=False,
31
- label="Use Description Expansion",
32
- info="Expand product descriptions using AI before matching"
33
- )
34
- top_n = gr.Slider(1, 25, 10, step=1, label="Top N Results")
35
- confidence = gr.Slider(0.1, 0.9, 0.5, label="Similarity Threshold")
36
-
37
- with gr.Row():
38
- examples_btn = gr.Button("Load Examples", variant="secondary")
39
- categorize_btn = gr.Button("Find Similar Ingredients", variant="primary")
40
-
41
- with gr.Column(scale=1):
42
- # Results section
43
- text_output = gr.HTML(label="Similar Ingredients Results", elem_id="results-container")
44
-
45
-
46
- # New Category Matching Tab
47
- with gr.TabItem("Category Embeddings"):
48
- with gr.Row():
49
- with gr.Column(scale=1):
50
- # Input section
51
- category_text_input = gr.Textbox(
52
- lines=10,
53
- placeholder="Enter product names, one per line",
54
- label="Product Names"
55
- )
56
- category_input_controls = gr.Row()
57
- with category_input_controls:
58
- category_use_expansion = gr.Checkbox(
59
- value=False,
60
- label="Use Description Expansion",
61
- info="Expand product descriptions using AI before matching"
62
- )
63
- category_top_n = gr.Slider(1, 10, 5, step=1, label="Top N Categories")
64
- category_confidence = gr.Slider(0.1, 0.9, 0.5, label="Matching Threshold")
65
-
66
- with gr.Row():
67
- category_examples_btn = gr.Button("Load Examples", variant="secondary")
68
- match_categories_btn = gr.Button("Match to Categories", variant="primary")
69
-
70
- with gr.Column(scale=1):
71
- # Results section
72
- category_output = gr.HTML(label="Category Matching Results", elem_id="results-container")
73
-
74
- # Common function to create reranking UI tabs
75
- def create_reranking_tab(tab_name, fn_name, default_match="ingredients"):
76
- with gr.TabItem(tab_name):
77
- with gr.Row():
78
- with gr.Column(scale=1):
79
- # Input section
80
- tab_input = gr.Textbox(
81
- lines=10,
82
- placeholder="Enter product names, one per line",
83
- label="Product Names"
84
- )
85
- with gr.Row():
86
- tab_expansion = gr.Checkbox(
87
- value=False,
88
- label="Use Description Expansion",
89
- info="Expand product descriptions using AI before matching"
90
- )
91
- tab_emb_top_n = gr.Slider(1, 50, 20, step=1, label="Embedding Top N Results")
92
- tab_top_n = gr.Slider(1, 10, 5, step=1, label="Final Top N Results")
93
- tab_confidence = gr.Slider(0.1, 0.9, 0.5, label="Matching Threshold")
94
-
95
- tab_match_type = gr.Radio(
96
- choices=["categories", "ingredients"],
97
- value=default_match,
98
- label="Match Type",
99
- info="Choose whether to match against ingredients or categories"
100
- )
101
-
102
- with gr.Row():
103
- tab_examples_btn = gr.Button("Load Examples", variant="secondary")
104
- tab_match_btn = gr.Button(f"Match using {tab_name}", variant="primary")
105
-
106
- with gr.Column(scale=1):
107
- # Results section
108
- tab_output = gr.HTML(label=f"{tab_name} Results", elem_id="results-container")
109
-
110
- # Connect button events
111
- tab_match_btn.click(
112
- fn=fn_name,
113
- inputs=[tab_input, gr.State(False), tab_expansion, tab_emb_top_n,
114
- tab_top_n, tab_confidence, tab_match_type],
115
- outputs=[tab_output],
116
  )
117
-
118
- tab_examples_btn.click(
119
- fn=load_examples,
120
- inputs=[],
121
- outputs=tab_input
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  )
123
-
124
- # Create the reranking tabs using the shared function
125
- create_reranking_tab("Voyage AI Reranking", categorize_products_with_voyage_reranking, "categories")
126
- create_reranking_tab("OpenAI Reranking", categorize_products_with_openai_reranking, "categories")
127
-
128
- # New Comparison Tab
129
- with gr.TabItem("Compare Methods"):
130
- with gr.Row():
131
- with gr.Column():
132
- compare_product_input = gr.Textbox(
133
- label="Enter product names (one per line)",
134
- placeholder="4 Tbsp sweet pickle relish\nchocolate chips\nfresh parsley",
135
- lines=5
136
- )
137
-
138
- with gr.Row():
139
- compare_embedding_top_n = gr.Slider(
140
- minimum=5, maximum=50, value=20, step=5,
141
- label="Initial embedding candidates"
142
- )
143
- compare_final_top_n = gr.Slider(
144
- minimum=1, maximum=10, value=3, step=1,
145
- label="Final results per method"
146
- )
147
- compare_confidence_threshold = gr.Slider(
148
- minimum=0.0, maximum=1.0, value=0.5, step=0.05,
149
- label="Confidence threshold"
150
- )
151
-
152
- compare_match_type = gr.Radio(
153
- choices=["categories", "ingredients"],
154
- value="categories",
155
- label="Match Type",
156
- info="Choose whether to match against ingredients or categories"
157
- )
158
-
159
- # Add expansion checkbox
160
- compare_expansion = gr.Checkbox(
161
- value=False,
162
- label="Use Description Expansion",
163
- info="Expand product descriptions using AI before matching"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
  )
165
-
166
- compare_btn = gr.Button("Compare Methods", variant="primary")
167
- compare_examples_btn = gr.Button("Load Examples", variant="secondary")
168
-
169
- with gr.Column():
170
- comparison_output = gr.HTML(label="Results", elem_id="results-container")
171
-
172
- # Connect the compare button
173
- compare_btn.click(
174
- fn=compare_ingredient_methods_ui,
175
- inputs=[
176
- compare_product_input,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
  compare_embedding_top_n,
178
  compare_final_top_n,
179
  compare_confidence_threshold,
180
  compare_match_type,
181
  compare_expansion
182
- ],
183
- outputs=comparison_output
184
- )
185
-
186
- # Add examples button functionality
187
- compare_examples_btn.click(
188
- fn=load_examples,
189
- inputs=[],
190
- outputs=compare_product_input
191
- )
192
-
193
- # Connect buttons for ingredient matching
194
- categorize_btn.click(
195
- fn=categorize_products,
196
- inputs=[text_input, gr.State(False), use_expansion, top_n, confidence],
197
- outputs=[text_output],
198
- )
199
-
200
- # Connect buttons for category matching
201
- match_categories_btn.click(
202
- fn=categorize_products_by_category,
203
- inputs=[category_text_input, gr.State(False), category_use_expansion, category_top_n, category_confidence],
204
- outputs=[category_output],
205
- )
206
-
207
- # Examples buttons for the first two tabs
208
- examples_btn.click(
209
- fn=load_examples,
210
- inputs=[],
211
- outputs=text_input
212
- )
213
-
214
- category_examples_btn.click(
215
- fn=load_examples, # Reuse the same examples
216
- inputs=[],
217
- outputs=category_text_input
218
- )
219
-
220
- gr.Markdown("Powered by Voyage AI embeddings • Built with Gradio")
221
-
222
- return demo
 
1
+ import streamlit as st
2
+ import pandas as pd
3
  from comparison import compare_ingredient_methods_ui
4
+ from ui_core import embeddings, load_examples
 
 
5
  from ui_ingredient_matching import categorize_products
6
  from ui_category_matching import categorize_products_by_category
7
  from ui_hybrid_matching import categorize_products_with_voyage_reranking
8
  from ui_expanded_matching import categorize_products_with_openai_reranking
9
+ # Removed unused import: from ui_formatters import format_results_html
10
 
11
+ # Initialize session state keys if they don't exist
12
+ if 'ingredient_input' not in st.session_state:
13
+ st.session_state.ingredient_input = ""
14
+ if 'category_input' not in st.session_state:
15
+ st.session_state.category_input = ""
16
+ if 'voyage_input' not in st.session_state:
17
+ st.session_state.voyage_input = ""
18
+ if 'openai_input' not in st.session_state:
19
+ st.session_state.openai_input = ""
20
+ if 'compare_input' not in st.session_state:
21
+ st.session_state.compare_input = ""
22
+
23
+
24
+ def render_ui():
25
+ """Render the Streamlit interface"""
26
+ # Page config is now set in app.py
27
+ st.title("Product Categorization Tool")
28
+ st.markdown("Analyze products by matching to ingredients or categories using AI embeddings.")
29
+
30
+ # Use st.tabs for the different sections
31
+ tab_ingredient, tab_category, tab_voyage, tab_openai, tab_compare = st.tabs([
32
+ "Ingredient Embeddings",
33
+ "Category Embeddings",
34
+ "Voyage AI Reranking",
35
+ "OpenAI Reranking",
36
+ "Compare Methods"
37
+ ])
38
+
39
+ # --- Ingredient Matching Tab ---
40
+ with tab_ingredient:
41
+ st.header("Match Products to Ingredients")
42
+ col1, col2 = st.columns(2)
43
+ with col1:
44
+ # Handle button click *before* rendering the text area
45
+ if st.button("Load Examples", key="ingredient_examples"):
46
+ st.session_state.ingredient_input = load_examples() # Update state for next rerun
47
+
48
+ # Input section - Use the session state value
49
+ text_input = st.text_area(
50
+ "Product Names (one per line)",
51
+ value=st.session_state.ingredient_input, # Use value from state
52
+ placeholder="Enter product names, one per line",
53
+ height=250,
54
+ key="ingredient_input_widget" # Use a different key for the widget itself if needed, or manage via value
55
+ )
56
+ # Update session state if user types manually
57
+ st.session_state.ingredient_input = text_input
58
+
59
+ use_expansion = st.checkbox(
60
+ "Use Description Expansion (AI)",
61
+ value=False,
62
+ key="ingredient_expansion",
63
+ help="Expand product descriptions using AI before matching"
64
+ )
65
+ top_n = st.slider("Top N Results", 1, 25, 10, step=1, key="ingredient_top_n")
66
+ confidence = st.slider("Similarity Threshold", 0.1, 0.9, 0.5, step=0.05, key="ingredient_confidence")
67
+
68
+ find_ingredients_btn = st.button("Find Similar Ingredients", type="primary", key="ingredient_find")
69
+
70
+ with col2:
71
+ # Results section
72
+ st.subheader("Results")
73
+ results_placeholder_ingredient = st.empty()
74
+ if find_ingredients_btn:
75
+ if st.session_state.ingredient_input: # Check state value
76
+ results_html = categorize_products(
77
+ st.session_state.ingredient_input,
78
+ False,
79
+ use_expansion,
80
+ top_n,
81
+ confidence
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  )
83
+ results_placeholder_ingredient.markdown(results_html, unsafe_allow_html=True)
84
+ else:
85
+ results_placeholder_ingredient.warning("Please enter product names.")
86
+
87
+ # --- Category Matching Tab ---
88
+ with tab_category:
89
+ st.header("Match Products to Categories")
90
+ col1, col2 = st.columns(2)
91
+ with col1:
92
+ if st.button("Load Examples", key="category_examples"):
93
+ st.session_state.category_input = load_examples()
94
+
95
+ category_text_input = st.text_area(
96
+ "Product Names (one per line)",
97
+ value=st.session_state.category_input,
98
+ placeholder="Enter product names, one per line",
99
+ height=250,
100
+ key="category_input_widget"
101
+ )
102
+ st.session_state.category_input = category_text_input
103
+
104
+ category_use_expansion = st.checkbox(
105
+ "Use Description Expansion (AI)",
106
+ value=False,
107
+ key="category_expansion",
108
+ help="Expand product descriptions using AI before matching"
109
+ )
110
+ category_top_n = st.slider("Top N Categories", 1, 10, 5, step=1, key="category_top_n")
111
+ category_confidence = st.slider("Matching Threshold", 0.1, 0.9, 0.5, step=0.05, key="category_confidence")
112
+
113
+ match_categories_btn = st.button("Match to Categories", type="primary", key="category_match")
114
+
115
+ with col2:
116
+ st.subheader("Results")
117
+ results_placeholder_category = st.empty()
118
+ if match_categories_btn:
119
+ if st.session_state.category_input:
120
+ results_html = categorize_products_by_category(
121
+ st.session_state.category_input,
122
+ False,
123
+ category_use_expansion,
124
+ category_top_n,
125
+ category_confidence
126
  )
127
+ results_placeholder_category.markdown(results_html, unsafe_allow_html=True)
128
+ else:
129
+ results_placeholder_category.warning("Please enter product names.")
130
+
131
+ # --- Common function for Reranking Tabs ---
132
+ def create_reranking_ui(tab, tab_key_prefix, tab_name, backend_function, default_match="categories"):
133
+ with tab:
134
+ st.header(f"Match using {tab_name}")
135
+ col1, col2 = st.columns(2)
136
+ with col1:
137
+ if st.button("Load Examples", key=f"{tab_key_prefix}_examples"):
138
+ st.session_state[f"{tab_key_prefix}_input"] = load_examples()
139
+
140
+ tab_input_value = st.text_area(
141
+ "Product Names (one per line)",
142
+ value=st.session_state[f"{tab_key_prefix}_input"],
143
+ placeholder="Enter product names, one per line",
144
+ height=250,
145
+ key=f"{tab_key_prefix}_input_widget"
146
+ )
147
+ st.session_state[f"{tab_key_prefix}_input"] = tab_input_value # Update state
148
+
149
+ tab_expansion = st.checkbox(
150
+ "Use Description Expansion (AI)",
151
+ value=False,
152
+ key=f"{tab_key_prefix}_expansion",
153
+ help="Expand product descriptions using AI before matching"
154
+ )
155
+ tab_emb_top_n = st.slider("Embedding Top N Results", 1, 50, 20, step=1, key=f"{tab_key_prefix}_emb_top_n")
156
+ tab_top_n = st.slider("Final Top N Results", 1, 10, 5, step=1, key=f"{tab_key_prefix}_final_top_n")
157
+ tab_confidence = st.slider("Matching Threshold", 0.1, 0.9, 0.5, step=0.05, key=f"{tab_key_prefix}_confidence")
158
+ tab_match_type = st.radio(
159
+ "Match Type",
160
+ options=["categories", "ingredients"],
161
+ index=0 if default_match == "categories" else 1,
162
+ key=f"{tab_key_prefix}_match_type",
163
+ horizontal=True,
164
+ help="Choose whether to match against ingredients or categories"
165
+ )
166
+
167
+ tab_match_btn = st.button(f"Match using {tab_name}", type="primary", key=f"{tab_key_prefix}_match")
168
+
169
+ with col2:
170
+ st.subheader("Results")
171
+ results_placeholder_rerank = st.empty()
172
+ if tab_match_btn:
173
+ if st.session_state[f"{tab_key_prefix}_input"]:
174
+ results_html = backend_function(
175
+ st.session_state[f"{tab_key_prefix}_input"],
176
+ False,
177
+ tab_expansion,
178
+ tab_emb_top_n,
179
+ tab_top_n,
180
+ tab_confidence,
181
+ tab_match_type
182
  )
183
+ results_placeholder_rerank.markdown(results_html, unsafe_allow_html=True)
184
+ else:
185
+ results_placeholder_rerank.warning("Please enter product names.")
186
+
187
+ # Create the reranking tabs
188
+ create_reranking_ui(tab_voyage, "voyage", "Voyage AI Reranking", categorize_products_with_voyage_reranking, "categories")
189
+ create_reranking_ui(tab_openai, "openai", "OpenAI Reranking", categorize_products_with_openai_reranking, "categories")
190
+
191
+ # --- Compare Methods Tab ---
192
+ with tab_compare:
193
+ st.header("Compare Matching Methods")
194
+ col1, col2 = st.columns(2)
195
+ with col1:
196
+ if st.button("Load Examples", key="compare_examples"):
197
+ st.session_state.compare_input = load_examples()
198
+
199
+ compare_product_input_value = st.text_area(
200
+ "Product Names (one per line)",
201
+ value=st.session_state.compare_input,
202
+ placeholder="4 Tbsp sweet pickle relish\nchocolate chips\nfresh parsley",
203
+ height=200,
204
+ key="compare_input_widget"
205
+ )
206
+ st.session_state.compare_input = compare_product_input_value # Update state
207
+
208
+ compare_embedding_top_n = st.slider(
209
+ "Initial embedding candidates",
210
+ min_value=5, max_value=50, value=20, step=5,
211
+ key="compare_emb_top_n"
212
+ )
213
+ compare_final_top_n = st.slider(
214
+ "Final results per method",
215
+ min_value=1, max_value=10, value=3, step=1,
216
+ key="compare_final_top_n"
217
+ )
218
+ compare_confidence_threshold = st.slider(
219
+ "Confidence threshold",
220
+ min_value=0.0, max_value=1.0, value=0.5, step=0.05,
221
+ key="compare_confidence"
222
+ )
223
+ compare_match_type = st.radio(
224
+ "Match Type",
225
+ options=["categories", "ingredients"],
226
+ index=0,
227
+ key="compare_match_type",
228
+ horizontal=True,
229
+ help="Choose whether to match against ingredients or categories"
230
+ )
231
+ compare_expansion = st.checkbox(
232
+ "Use Description Expansion (AI)",
233
+ value=False,
234
+ key="compare_expansion",
235
+ help="Expand product descriptions using AI before matching"
236
+ )
237
+
238
+ compare_btn = st.button("Compare Methods", type="primary", key="compare_run")
239
+
240
+ with col2:
241
+ st.subheader("Comparison Results")
242
+ results_placeholder_compare = st.empty()
243
+ if compare_btn:
244
+ if st.session_state.compare_input:
245
+ results_html = compare_ingredient_methods_ui(
246
+ st.session_state.compare_input,
247
  compare_embedding_top_n,
248
  compare_final_top_n,
249
  compare_confidence_threshold,
250
  compare_match_type,
251
  compare_expansion
252
+ )
253
+ results_placeholder_compare.markdown(results_html, unsafe_allow_html=True)
254
+ else:
255
+ results_placeholder_compare.warning("Please enter product names.")
256
+
257
+ st.markdown("---")
258
+ st.markdown("Powered by Voyage AI embeddings • Built with Streamlit")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ui_category_matching.py CHANGED
@@ -1,5 +1,5 @@
1
- import gradio as gr
2
- from utils import SafeProgress
3
  from category_matching import load_categories, match_products_to_categories
4
  from ui_core import parse_input
5
  from ui_formatters import format_categories_html
@@ -8,8 +8,9 @@ from openai_expansion import expand_product_descriptions
8
  def categorize_products_by_category(product_input, is_file=False, use_expansion=False, top_n=10, confidence_threshold=0.5):
9
 
10
  """Categorize products by matching them to predefined categories"""
11
- progress_tracker = SafeProgress(gr.Progress())
12
- progress_tracker(0, desc="Starting categorization...")
 
13
 
14
  # Parse input
15
  product_names, error = parse_input(product_input, is_file)
@@ -19,15 +20,15 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
19
  # Optional description expansion
20
  expanded_descriptions = {}
21
  if use_expansion:
22
- progress_tracker(0.1, desc="Expanding product descriptions...")
23
- expanded_descriptions = expand_product_descriptions(product_names, progress=progress_tracker)
24
  # Use expanded descriptions for matching if available
25
  products_to_match = [expanded_descriptions.get(p, p) for p in product_names]
26
  else:
27
  products_to_match = product_names
28
 
29
  # Load categories
30
- progress_tracker(0.2, desc="Loading categories...")
31
  categories = load_categories()
32
 
33
  # Create a mapping from original product names to expanded versions
@@ -37,13 +38,13 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
37
  product_to_expanded[product] = products_to_match[i]
38
 
39
  # Match products to categories
40
- progress_tracker(0.3, desc="Matching products to categories...")
41
  match_results = match_products_to_categories(
42
  products_to_match,
43
  categories,
44
  top_n=int(top_n),
45
- confidence_threshold=confidence_threshold,
46
- progress=progress_tracker
47
  )
48
 
49
  # Create a new dictionary mapping original product names to their results
@@ -53,7 +54,7 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
53
  original_product_results[product] = match_results[expanded]
54
 
55
  # Format results
56
- progress_tracker(0.9, desc="Formatting results...")
57
  output_html = "<div style='font-family: Arial, sans-serif; max-width: 100%; overflow-x: auto;'>"
58
  output_html += f"<p style='color: #555;'>Matched {len(product_names)} products to categories.</p>"
59
 
@@ -75,5 +76,5 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
75
  if not match_results:
76
  output_html = "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>No results found. Please check your input or try different products.</div>"
77
 
78
- progress_tracker(1.0, desc="Done!")
79
  return output_html
 
1
+ # import gradio as gr # Removed Gradio import
2
+ # from utils import SafeProgress # Removed SafeProgress import
3
  from category_matching import load_categories, match_products_to_categories
4
  from ui_core import parse_input
5
  from ui_formatters import format_categories_html
 
8
  def categorize_products_by_category(product_input, is_file=False, use_expansion=False, top_n=10, confidence_threshold=0.5):
9
 
10
  """Categorize products by matching them to predefined categories"""
11
+ # Removed Gradio progress tracking
12
+ # progress_tracker = SafeProgress(gr.Progress())
13
+ # progress_tracker(0, desc="Starting categorization...")
14
 
15
  # Parse input
16
  product_names, error = parse_input(product_input, is_file)
 
20
  # Optional description expansion
21
  expanded_descriptions = {}
22
  if use_expansion:
23
+ # progress_tracker(0.1, desc="Expanding product descriptions...") # Removed progress
24
+ expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
25
  # Use expanded descriptions for matching if available
26
  products_to_match = [expanded_descriptions.get(p, p) for p in product_names]
27
  else:
28
  products_to_match = product_names
29
 
30
  # Load categories
31
+ # progress_tracker(0.2, desc="Loading categories...") # Removed progress
32
  categories = load_categories()
33
 
34
  # Create a mapping from original product names to expanded versions
 
38
  product_to_expanded[product] = products_to_match[i]
39
 
40
  # Match products to categories
41
+ # progress_tracker(0.3, desc="Matching products to categories...") # Removed progress
42
  match_results = match_products_to_categories(
43
  products_to_match,
44
  categories,
45
  top_n=int(top_n),
46
+ confidence_threshold=confidence_threshold
47
+ # Removed progress argument
48
  )
49
 
50
  # Create a new dictionary mapping original product names to their results
 
54
  original_product_results[product] = match_results[expanded]
55
 
56
  # Format results
57
+ # progress_tracker(0.9, desc="Formatting results...") # Removed progress
58
  output_html = "<div style='font-family: Arial, sans-serif; max-width: 100%; overflow-x: auto;'>"
59
  output_html += f"<p style='color: #555;'>Matched {len(product_names)} products to categories.</p>"
60
 
 
76
  if not match_results:
77
  output_html = "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>No results found. Please check your input or try different products.</div>"
78
 
79
+ # progress_tracker(1.0, desc="Done!") # Removed progress
80
  return output_html
ui_expanded_matching.py CHANGED
@@ -1,5 +1,5 @@
1
- import gradio as gr
2
- from utils import SafeProgress
3
  from embeddings import create_product_embeddings
4
  from similarity import compute_similarities
5
  from openai_expansion import expand_product_descriptions
@@ -11,12 +11,13 @@ import json
11
 
12
  def categorize_products_with_openai_reranking(product_input, is_file=False, use_expansion=False,
13
  embedding_top_n=20, top_n=10, confidence_threshold=0.5,
14
- match_type="ingredients", progress=gr.Progress()):
15
  """
16
  Categorize products using OpenAI reranking with optional description expansion
17
  """
18
- progress_tracker = SafeProgress(progress)
19
- progress_tracker(0, desc="Starting OpenAI reranking...")
 
20
  # Parse input
21
  product_names, error = parse_input(product_input, is_file)
22
  if error:
@@ -28,8 +29,8 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
28
  # Optional description expansion
29
  expanded_descriptions = {}
30
  if use_expansion:
31
- progress_tracker(0.2, desc="Expanding product descriptions...")
32
- expanded_descriptions = expand_product_descriptions(product_names, progress=progress)
33
 
34
  # Get shared OpenAI client
35
  openai_client = get_openai_client()
@@ -38,13 +39,13 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
38
 
39
  if match_type == "ingredients":
40
  # Generate product embeddings
41
- progress_tracker(0.4, desc="Generating product embeddings...")
42
  if use_expansion and expanded_descriptions:
43
  # Use expanded descriptions for embedding creation when available
44
  products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
45
  # Map expanded descriptions back to original product names for consistent keys
46
  product_embeddings = {}
47
- temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress)
48
 
49
  # Ensure we use original product names as keys
50
  for i, product_name in enumerate(product_names):
@@ -52,10 +53,10 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
52
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
53
  else:
54
  # Standard embedding creation with just product names
55
- product_embeddings = create_product_embeddings(product_names, progress=progress)
56
 
57
  # Compute embedding similarities for ingredients
58
- progress_tracker(0.6, desc="Computing ingredient similarities...")
59
  all_similarities = compute_similarities(embeddings, product_embeddings)
60
 
61
  print(f"product_names: {product_names}")
@@ -65,7 +66,7 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
65
  if not all_similarities:
66
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No similarities found. Please try different product names.</div>"
67
 
68
- progress_tracker(0.7, desc="Re-ranking with OpenAI...")
69
 
70
  # Function for processing each product
71
  def process_reranking(product):
@@ -104,29 +105,26 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
104
  final_results = process_in_parallel(
105
  items=product_names,
106
  processor_func=process_reranking,
107
- max_workers=min(10, len(product_names)),
108
- progress_tracker=progress_tracker,
109
- progress_start=0.7,
110
- progress_end=0.9,
111
- progress_desc="Re-ranking"
112
- )
113
 
114
  else: # categories
115
  # Load category embeddings instead of JSON categories
116
- progress_tracker(0.5, desc="Loading category embeddings...")
117
  category_embeddings = load_category_embeddings()
118
 
119
  if not category_embeddings:
120
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No category embeddings found. Please check that the embeddings file exists at data/category_embeddings.pickle.</div>"
121
 
122
  # Generate product embeddings
123
- progress_tracker(0.6, desc="Generating product embeddings...")
124
  if use_expansion and expanded_descriptions:
125
  # Use expanded descriptions for embedding creation when available
126
  products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
127
  # Map expanded descriptions back to original product names for consistent keys
128
  product_embeddings = {}
129
- temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress)
130
 
131
  # Ensure we use original product names as keys
132
  for i, product_name in enumerate(product_names):
@@ -134,10 +132,10 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
134
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
135
  else:
136
  # Standard embedding creation with just product names
137
- product_embeddings = create_product_embeddings(product_names, progress=progress)
138
 
139
  # Compute embedding similarities for categories
140
- progress_tracker(0.7, desc="Computing category similarities...")
141
  all_similarities = compute_similarities(category_embeddings, product_embeddings)
142
 
143
  if not all_similarities:
@@ -150,7 +148,7 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
150
  needed_category_ids.add(category_id)
151
 
152
  # Load only the needed categories from JSON
153
- progress_tracker(0.75, desc="Loading category descriptions...")
154
  category_descriptions = {}
155
  if needed_category_ids:
156
  try:
@@ -211,15 +209,12 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
211
  final_results = process_in_parallel(
212
  items=product_names,
213
  processor_func=process_category_matching,
214
- max_workers=min(10, len(product_names)),
215
- progress_tracker=progress_tracker,
216
- progress_start=0.7,
217
- progress_end=0.9,
218
- progress_desc="Category matching"
219
- )
220
 
221
  # Format results
222
- progress_tracker(0.9, desc="Formatting results...")
223
 
224
  # Create a list of result dictionaries in consistent format
225
  formatted_results = []
@@ -259,5 +254,5 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
259
  confidence_threshold=confidence_threshold # Pass the threshold to the formatter
260
  )
261
 
262
- progress_tracker(1.0, desc="Done!")
263
  return result_html
 
1
+ # import gradio as gr # Removed Gradio import
2
+ # from utils import SafeProgress # Removed SafeProgress import
3
  from embeddings import create_product_embeddings
4
  from similarity import compute_similarities
5
  from openai_expansion import expand_product_descriptions
 
11
 
12
  def categorize_products_with_openai_reranking(product_input, is_file=False, use_expansion=False,
13
  embedding_top_n=20, top_n=10, confidence_threshold=0.5,
14
+ match_type="ingredients"): # Removed progress parameter
15
  """
16
  Categorize products using OpenAI reranking with optional description expansion
17
  """
18
+ # Removed Gradio progress tracking
19
+ # progress_tracker = SafeProgress(progress)
20
+ # progress_tracker(0, desc="Starting OpenAI reranking...")
21
  # Parse input
22
  product_names, error = parse_input(product_input, is_file)
23
  if error:
 
29
  # Optional description expansion
30
  expanded_descriptions = {}
31
  if use_expansion:
32
+ # progress_tracker(0.2, desc="Expanding product descriptions...") # Removed progress
33
+ expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
34
 
35
  # Get shared OpenAI client
36
  openai_client = get_openai_client()
 
39
 
40
  if match_type == "ingredients":
41
  # Generate product embeddings
42
+ # progress_tracker(0.4, desc="Generating product embeddings...") # Removed progress
43
  if use_expansion and expanded_descriptions:
44
  # Use expanded descriptions for embedding creation when available
45
  products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
46
  # Map expanded descriptions back to original product names for consistent keys
47
  product_embeddings = {}
48
+ temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names
49
 
50
  # Ensure we use original product names as keys
51
  for i, product_name in enumerate(product_names):
 
53
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
54
  else:
55
  # Standard embedding creation with just product names
56
+ product_embeddings = create_product_embeddings(product_names) # Removed progress
57
 
58
  # Compute embedding similarities for ingredients
59
+ # progress_tracker(0.6, desc="Computing ingredient similarities...") # Removed progress
60
  all_similarities = compute_similarities(embeddings, product_embeddings)
61
 
62
  print(f"product_names: {product_names}")
 
66
  if not all_similarities:
67
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No similarities found. Please try different product names.</div>"
68
 
69
+ # progress_tracker(0.7, desc="Re-ranking with OpenAI...") # Removed progress
70
 
71
  # Function for processing each product
72
  def process_reranking(product):
 
105
  final_results = process_in_parallel(
106
  items=product_names,
107
  processor_func=process_reranking,
108
+ max_workers=min(10, len(product_names)) # Moved max_workers inside
109
+ # Removed progress tracking arguments
110
+ ) # Corrected closing parenthesis
 
 
 
111
 
112
  else: # categories
113
  # Load category embeddings instead of JSON categories
114
+ # progress_tracker(0.5, desc="Loading category embeddings...") # Removed progress
115
  category_embeddings = load_category_embeddings()
116
 
117
  if not category_embeddings:
118
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No category embeddings found. Please check that the embeddings file exists at data/category_embeddings.pickle.</div>"
119
 
120
  # Generate product embeddings
121
+ # progress_tracker(0.6, desc="Generating product embeddings...") # Removed progress
122
  if use_expansion and expanded_descriptions:
123
  # Use expanded descriptions for embedding creation when available
124
  products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
125
  # Map expanded descriptions back to original product names for consistent keys
126
  product_embeddings = {}
127
+ temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names
128
 
129
  # Ensure we use original product names as keys
130
  for i, product_name in enumerate(product_names):
 
132
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
133
  else:
134
  # Standard embedding creation with just product names
135
+ product_embeddings = create_product_embeddings(product_names) # Removed progress
136
 
137
  # Compute embedding similarities for categories
138
+ # progress_tracker(0.7, desc="Computing category similarities...") # Removed progress
139
  all_similarities = compute_similarities(category_embeddings, product_embeddings)
140
 
141
  if not all_similarities:
 
148
  needed_category_ids.add(category_id)
149
 
150
  # Load only the needed categories from JSON
151
+ # progress_tracker(0.75, desc="Loading category descriptions...") # Removed progress
152
  category_descriptions = {}
153
  if needed_category_ids:
154
  try:
 
209
  final_results = process_in_parallel(
210
  items=product_names,
211
  processor_func=process_category_matching,
212
+ max_workers=min(10, len(product_names)) # Restored max_workers inside the call
213
+ # Removed progress tracking arguments
214
+ ) # Correctly placed closing parenthesis
 
 
 
215
 
216
  # Format results
217
+ # progress_tracker(0.9, desc="Formatting results...") # Removed progress
218
 
219
  # Create a list of result dictionaries in consistent format
220
  formatted_results = []
 
254
  confidence_threshold=confidence_threshold # Pass the threshold to the formatter
255
  )
256
 
257
+ # progress_tracker(1.0, desc="Done!") # Removed progress
258
  return result_html
ui_hybrid_matching.py CHANGED
@@ -1,5 +1,5 @@
1
- import gradio as gr
2
- from utils import SafeProgress
3
  from category_matching import load_categories, hybrid_category_matching
4
  from similarity import hybrid_ingredient_matching, compute_similarities
5
  from ui_core import embeddings, parse_input
@@ -9,12 +9,13 @@ from api_utils import get_voyage_client
9
 
10
  def categorize_products_with_voyage_reranking(product_input, is_file=False, use_expansion=False,
11
  embedding_top_n=20, final_top_n=5, confidence_threshold=0.5,
12
- match_type="categories", progress=gr.Progress()):
13
  """
14
  Categorize products using Voyage reranking with optional description expansion
15
  """
16
- progress_tracker = SafeProgress(progress)
17
- progress_tracker(0, desc=f"Starting Voyage reranking for {match_type}...")
 
18
 
19
  # Parse input
20
  product_names, error = parse_input(product_input, is_file)
@@ -24,24 +25,24 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
24
  # Optional description expansion
25
  expanded_descriptions = {}
26
  if use_expansion:
27
- progress_tracker(0.3, desc="Expanding product descriptions...")
28
- expanded_descriptions = expand_product_descriptions(product_names, progress=progress)
29
 
30
  match_results = {}
31
  if match_type == "categories":
32
  # Load categories
33
- progress_tracker(0.2, desc="Loading categories...")
34
  categories = load_categories()
35
 
36
  # Use hybrid approach for categories with optional expanded descriptions
37
- progress_tracker(0.5, desc="Finding and re-ranking categories...")
38
  match_results = hybrid_category_matching(
39
  product_names, categories,
40
  embedding_top_n=int(embedding_top_n),
41
- final_top_n=int(final_top_n),
42
  confidence_threshold=0.0, # Don't apply threshold here - do it in display
43
- expanded_descriptions=expanded_descriptions if use_expansion else None,
44
- progress=progress
45
  )
46
  else: # ingredients
47
  # Validate embeddings are loaded
@@ -49,18 +50,18 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
49
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No ingredient embeddings loaded. Please check that the embeddings file exists and is properly formatted.</div>"
50
 
51
  # Use hybrid approach for ingredients with optional expanded descriptions
52
- progress_tracker(0.5, desc="Finding and re-ranking ingredients...")
53
  match_results = hybrid_ingredient_matching(
54
  product_names, embeddings,
55
  embedding_top_n=int(embedding_top_n),
56
- final_top_n=int(final_top_n),
57
  confidence_threshold=0.0, # Don't apply threshold here - do it in display
58
- expanded_descriptions=expanded_descriptions if use_expansion else None,
59
- progress=progress
60
  )
61
 
62
  # Format results
63
- progress_tracker(0.9, desc="Formatting results...")
64
 
65
  # Convert to unified format for formatter
66
  formatted_results = []
@@ -109,7 +110,7 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
109
  confidence_threshold=confidence_threshold # Pass the threshold to the formatter
110
  )
111
 
112
- progress_tracker(1.0, desc="Done!")
113
  return result_html
114
 
115
  # Update the function in ui_hybrid_matching.py
@@ -117,13 +118,14 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
117
  embedding_top_n=20, final_top_n=5,
118
  confidence_threshold=0.5,
119
  expanded_descriptions=None,
120
- progress=None):
121
  """Use Voyage AI for reranking instead of OpenAI"""
122
- from utils import SafeProgress
123
  from embeddings import create_product_embeddings
124
 
125
- progress_tracker = SafeProgress(progress, desc="Voyage ingredient matching")
126
- progress_tracker(0.1, desc="Stage 1: Finding candidates with embeddings")
 
127
 
128
  # Stage 1: Same as before - use embeddings to find candidates
129
  if expanded_descriptions:
@@ -131,7 +133,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
131
  products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
132
  # Map expanded descriptions back to original product names for consistent keys
133
  product_embeddings = {}
134
- temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress_tracker)
135
 
136
  # Ensure we use original product names as keys
137
  for i, product_name in enumerate(products):
@@ -139,7 +141,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
139
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
140
  else:
141
  # Standard embedding creation with just product names
142
- product_embeddings = create_product_embeddings(products, progress=progress_tracker)
143
 
144
  similarities = compute_similarities(ingredients_dict, product_embeddings)
145
 
@@ -148,7 +150,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
148
  for product, product_similarities in similarities.items():
149
  embedding_results[product] = product_similarities[:embedding_top_n]
150
 
151
- progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI")
152
 
153
  # Initialize Voyage client
154
  voyage_client = get_voyage_client()
@@ -157,7 +159,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
157
  final_results = {}
158
 
159
  for i, product in enumerate(products):
160
- progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}")
161
 
162
  if product not in embedding_results or not embedding_results[product]:
163
  final_results[product] = []
@@ -197,7 +199,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
197
  # Fall back to embedding results
198
  final_results[product] = candidates[:1]
199
 
200
- progress_tracker(1.0, desc="Voyage ingredient matching complete")
201
  return final_results
202
 
203
  # Add this function to ui_hybrid_matching.py
@@ -206,13 +208,14 @@ def hybrid_category_matching_voyage(products, categories_dict,
206
  embedding_top_n=20, final_top_n=5,
207
  confidence_threshold=0.5,
208
  expanded_descriptions=None,
209
- progress=None):
210
  """Use Voyage AI for reranking categories instead of OpenAI"""
211
- from utils import SafeProgress
212
  from embeddings import create_product_embeddings
213
 
214
- progress_tracker = SafeProgress(progress, desc="Voyage category matching")
215
- progress_tracker(0.1, desc="Stage 1: Finding candidate categories with embeddings")
 
216
 
217
  # Stage 1: Same as before - use embeddings to find candidates
218
  if expanded_descriptions:
@@ -220,7 +223,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
220
  products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
221
  # Map expanded descriptions back to original product names for consistent keys
222
  product_embeddings = {}
223
- temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress_tracker)
224
 
225
  # Ensure we use original product names as keys
226
  for i, product_name in enumerate(products):
@@ -228,7 +231,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
228
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
229
  else:
230
  # Standard embedding creation with just product names
231
- product_embeddings = create_product_embeddings(products, progress=progress_tracker)
232
 
233
  from similarity import compute_similarities
234
  similarities = compute_similarities(categories_dict, product_embeddings)
@@ -238,7 +241,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
238
  for product, product_similarities in similarities.items():
239
  embedding_results[product] = product_similarities[:embedding_top_n]
240
 
241
- progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI")
242
 
243
  # Initialize Voyage client
244
  voyage_client = get_voyage_client()
@@ -246,7 +249,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
246
  # Stage 2: Re-rank using Voyage AI
247
  final_results = {}
248
  for i, product in enumerate(products):
249
- progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}")
250
 
251
  if product not in embedding_results or not embedding_results[product]:
252
  final_results[product] = []
@@ -286,5 +289,5 @@ def hybrid_category_matching_voyage(products, categories_dict,
286
  # Fall back to embedding results
287
  final_results[product] = candidates[:1]
288
 
289
- progress_tracker(1.0, desc="Voyage category matching complete")
290
  return final_results
 
1
+ # import gradio as gr # Removed Gradio import
2
+ # from utils import SafeProgress # Removed SafeProgress import
3
  from category_matching import load_categories, hybrid_category_matching
4
  from similarity import hybrid_ingredient_matching, compute_similarities
5
  from ui_core import embeddings, parse_input
 
9
 
10
  def categorize_products_with_voyage_reranking(product_input, is_file=False, use_expansion=False,
11
  embedding_top_n=20, final_top_n=5, confidence_threshold=0.5,
12
+ match_type="categories"): # Removed progress parameter
13
  """
14
  Categorize products using Voyage reranking with optional description expansion
15
  """
16
+ # Removed Gradio progress tracking
17
+ # progress_tracker = SafeProgress(progress)
18
+ # progress_tracker(0, desc=f"Starting Voyage reranking for {match_type}...")
19
 
20
  # Parse input
21
  product_names, error = parse_input(product_input, is_file)
 
25
  # Optional description expansion
26
  expanded_descriptions = {}
27
  if use_expansion:
28
+ # progress_tracker(0.3, desc="Expanding product descriptions...") # Removed progress
29
+ expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
30
 
31
  match_results = {}
32
  if match_type == "categories":
33
  # Load categories
34
+ # progress_tracker(0.2, desc="Loading categories...") # Removed progress
35
  categories = load_categories()
36
 
37
  # Use hybrid approach for categories with optional expanded descriptions
38
+ # progress_tracker(0.5, desc="Finding and re-ranking categories...") # Removed progress
39
  match_results = hybrid_category_matching(
40
  product_names, categories,
41
  embedding_top_n=int(embedding_top_n),
42
+ final_top_n=int(final_top_n),
43
  confidence_threshold=0.0, # Don't apply threshold here - do it in display
44
+ expanded_descriptions=expanded_descriptions if use_expansion else None
45
+ # Removed progress argument
46
  )
47
  else: # ingredients
48
  # Validate embeddings are loaded
 
50
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No ingredient embeddings loaded. Please check that the embeddings file exists and is properly formatted.</div>"
51
 
52
  # Use hybrid approach for ingredients with optional expanded descriptions
53
+ # progress_tracker(0.5, desc="Finding and re-ranking ingredients...") # Removed progress
54
  match_results = hybrid_ingredient_matching(
55
  product_names, embeddings,
56
  embedding_top_n=int(embedding_top_n),
57
+ final_top_n=int(final_top_n),
58
  confidence_threshold=0.0, # Don't apply threshold here - do it in display
59
+ expanded_descriptions=expanded_descriptions if use_expansion else None
60
+ # Removed progress argument
61
  )
62
 
63
  # Format results
64
+ # progress_tracker(0.9, desc="Formatting results...") # Removed progress
65
 
66
  # Convert to unified format for formatter
67
  formatted_results = []
 
110
  confidence_threshold=confidence_threshold # Pass the threshold to the formatter
111
  )
112
 
113
+ # progress_tracker(1.0, desc="Done!") # Removed progress
114
  return result_html
115
 
116
  # Update the function in ui_hybrid_matching.py
 
118
  embedding_top_n=20, final_top_n=5,
119
  confidence_threshold=0.5,
120
  expanded_descriptions=None,
121
+ ): # Removed progress parameter
122
  """Use Voyage AI for reranking instead of OpenAI"""
123
+ # from utils import SafeProgress # Removed SafeProgress import
124
  from embeddings import create_product_embeddings
125
 
126
+ # Removed Gradio progress tracking
127
+ # progress_tracker = SafeProgress(progress, desc="Voyage ingredient matching")
128
+ # progress_tracker(0.1, desc="Stage 1: Finding candidates with embeddings")
129
 
130
  # Stage 1: Same as before - use embeddings to find candidates
131
  if expanded_descriptions:
 
133
  products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
134
  # Map expanded descriptions back to original product names for consistent keys
135
  product_embeddings = {}
136
+ temp_embeddings = create_product_embeddings(products_for_embedding, original_products=products) # Removed progress, pass original names
137
 
138
  # Ensure we use original product names as keys
139
  for i, product_name in enumerate(products):
 
141
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
142
  else:
143
  # Standard embedding creation with just product names
144
+ product_embeddings = create_product_embeddings(products) # Removed progress
145
 
146
  similarities = compute_similarities(ingredients_dict, product_embeddings)
147
 
 
150
  for product, product_similarities in similarities.items():
151
  embedding_results[product] = product_similarities[:embedding_top_n]
152
 
153
+ # progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI") # Removed progress
154
 
155
  # Initialize Voyage client
156
  voyage_client = get_voyage_client()
 
159
  final_results = {}
160
 
161
  for i, product in enumerate(products):
162
+ # progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}") # Removed progress
163
 
164
  if product not in embedding_results or not embedding_results[product]:
165
  final_results[product] = []
 
199
  # Fall back to embedding results
200
  final_results[product] = candidates[:1]
201
 
202
+ # progress_tracker(1.0, desc="Voyage ingredient matching complete") # Removed progress
203
  return final_results
204
 
205
  # Add this function to ui_hybrid_matching.py
 
208
  embedding_top_n=20, final_top_n=5,
209
  confidence_threshold=0.5,
210
  expanded_descriptions=None,
211
+ ): # Removed progress parameter
212
  """Use Voyage AI for reranking categories instead of OpenAI"""
213
+ # from utils import SafeProgress # Removed SafeProgress import
214
  from embeddings import create_product_embeddings
215
 
216
+ # Removed Gradio progress tracking
217
+ # progress_tracker = SafeProgress(progress, desc="Voyage category matching")
218
+ # progress_tracker(0.1, desc="Stage 1: Finding candidate categories with embeddings")
219
 
220
  # Stage 1: Same as before - use embeddings to find candidates
221
  if expanded_descriptions:
 
223
  products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
224
  # Map expanded descriptions back to original product names for consistent keys
225
  product_embeddings = {}
226
+ temp_embeddings = create_product_embeddings(products_for_embedding, original_products=products) # Removed progress, pass original names
227
 
228
  # Ensure we use original product names as keys
229
  for i, product_name in enumerate(products):
 
231
  product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
232
  else:
233
  # Standard embedding creation with just product names
234
+ product_embeddings = create_product_embeddings(products) # Removed progress
235
 
236
  from similarity import compute_similarities
237
  similarities = compute_similarities(categories_dict, product_embeddings)
 
241
  for product, product_similarities in similarities.items():
242
  embedding_results[product] = product_similarities[:embedding_top_n]
243
 
244
+ # progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI") # Removed progress
245
 
246
  # Initialize Voyage client
247
  voyage_client = get_voyage_client()
 
249
  # Stage 2: Re-rank using Voyage AI
250
  final_results = {}
251
  for i, product in enumerate(products):
252
+ # progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}") # Removed progress
253
 
254
  if product not in embedding_results or not embedding_results[product]:
255
  final_results[product] = []
 
289
  # Fall back to embedding results
290
  final_results[product] = candidates[:1]
291
 
292
+ # progress_tracker(1.0, desc="Voyage category matching complete") # Removed progress
293
  return final_results
ui_ingredient_matching.py CHANGED
@@ -25,7 +25,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
25
  expanded_descriptions = {}
26
  if use_expansion:
27
  progress_tracker(0.2, desc="Expanding product descriptions...")
28
- expanded_descriptions = expand_product_descriptions(product_names, progress=gr.Progress())
29
 
30
  # Create embeddings
31
  progress_tracker(0.4, desc="Generating product embeddings...")
@@ -34,7 +34,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
34
  products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
35
  # Map expanded descriptions back to original product names for consistent keys
36
  products_embeddings = {}
37
- temp_embeddings = create_product_embeddings(products_for_embedding, progress=gr.Progress())
38
 
39
  # Ensure we use original product names as keys
40
  for i, product_name in enumerate(product_names):
@@ -42,7 +42,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
42
  products_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
43
  else:
44
  # Standard embedding creation with just product names
45
- products_embeddings = create_product_embeddings(product_names, progress=gr.Progress())
46
 
47
  if not products_embeddings:
48
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: Failed to generate product embeddings. Please try again with different product names.</div>"
 
25
  expanded_descriptions = {}
26
  if use_expansion:
27
  progress_tracker(0.2, desc="Expanding product descriptions...")
28
+ expanded_descriptions = expand_product_descriptions(product_names) # Removed progress
29
 
30
  # Create embeddings
31
  progress_tracker(0.4, desc="Generating product embeddings...")
 
34
  products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
35
  # Map expanded descriptions back to original product names for consistent keys
36
  products_embeddings = {}
37
+ temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names for keys
38
 
39
  # Ensure we use original product names as keys
40
  for i, product_name in enumerate(product_names):
 
42
  products_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
43
  else:
44
  # Standard embedding creation with just product names
45
+ products_embeddings = create_product_embeddings(product_names) # Removed progress
46
 
47
  if not products_embeddings:
48
  return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: Failed to generate product embeddings. Please try again with different product names.</div>"