Spaces:

eliago
/

product_ingredient_demo

Sleeping

App Files Files Community

esilver commited on Apr 4

Commit

164730f

1 Parent(s): e314c06

Use streamlit

Browse files

Files changed (10) hide show

README.md +66 -31
app.py +43 -22
comparison.py +40 -37
embeddings.py +3 -2
requirements.txt +3 -1
ui.py +244 -208
ui_category_matching.py +13 -12
ui_expanded_matching.py +27 -32
ui_hybrid_matching.py +39 -36
ui_ingredient_matching.py +3 -3

README.md CHANGED Viewed

@@ -1,52 +1,87 @@
 ---
 license: mit
-title: Demo
-sdk: gradio
 emoji: 🚀
 colorFrom: purple
 colorTo: yellow
-sdk_version: 5.22.0
 ---
-# Product Categorization App - One-Click Solution
-This is a turnkey solution for categorizing products based on their similarity to ingredients using Voyage AI.
 ## Quick Start
-1. Place your `ingredient_embeddings_voyageai.pkl` file in the same folder as this README
-2. Run the application:
-   ```bash
-   bash run_app.sh
-   ```
-3. That's it! A browser window will open with the app, and a public URL will be created for sharing
-## What You Can Do
-- **Text Input:** Enter product names one per line
-- **File Upload:** Upload a JSON file with product data
-- Adjust the number of categories and Similarity Threshold
-- View the categorization results with confidence scores
 ## Hosting on Hugging Face Spaces
-For permanent, free hosting on Gradio:
-1. Create a free account on [Hugging Face](https://huggingface.co/)
-2. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
-3. Click "Create a Space"
-4. Select "Gradio" as the SDK
-5. Upload all files (including your embeddings file) to the space
-6. Your app will be automatically deployed!
 ## Files Included
-- `app.py`: The main application code
-- `requirements.txt`: Required Python packages
-- `run_app.sh`: One-click deployment script
 ## Requirements
-- Python 3.7+
-- Internet connection (for Voyage AI API)

 ---
 license: mit
+title: Product Categorization Demo
+sdk: streamlit
 emoji: 🚀
 colorFrom: purple
 colorTo: yellow
+# sdk_version: (Streamlit doesn't typically use a fixed version here)
 ---
+# Product Categorization App - Streamlit Demo
+This is a Streamlit application for categorizing products based on their similarity to ingredients or predefined categories using AI embeddings (e.g., Voyage AI) and optional reranking (Voyage AI, OpenAI).
 ## Quick Start
+1.  **Clone the repository:**
+    ```bash
+    git clone <repository_url>
+    cd <repository_directory>
+    ```
+2.  **Create a virtual environment (optional but recommended):**
+    ```bash
+    python -m venv venv
+    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
+    ```
+3.  **Install dependencies:**
+    ```bash
+    pip install -r requirements.txt
+    ```
+4.  **Prepare Embeddings:** Ensure your embedding files (`ingredient_embeddings_voyageai.pkl`, `category_embeddings.pickle`, etc.) are present in the `data/` directory.
+5.  **Configure API Keys:**
+    *   Copy the `.env.example` file (if it exists) or create a new file named `.env`.
+    *   Add your API keys to the `.env` file:
+        ```dotenv
+        VOYAGE_API_KEY="YOUR_VOYAGE_API_KEY_HERE"
+        OPENAI_API_KEY="YOUR_OPENAI_API_KEY_HERE"
+        # Add other keys like CHICORY if needed
+        ```
+6.  **Run the application:**
+    ```bash
+    streamlit run app.py
+    ```
+    Alternatively, if you have configured the `./run_app.sh` script:
+    ```bash
+    ./run_app.sh
+    ```
+7.  The application will open in your default web browser.
+## Features
+-   **Multiple Matching Methods:**
+    -   Ingredient Embeddings
+    -   Category Embeddings
+    -   Voyage AI Reranking (Ingredients/Categories)
+    -   OpenAI Reranking (Ingredients/Categories)
+    -   Comparison View across methods
+-   **Text Input:** Enter product names one per line.
+-   **Description Expansion:** Optionally use OpenAI to expand product descriptions before matching.
+-   **Adjustable Parameters:** Control Top-N results, confidence thresholds, etc. for different methods.
+-   **Example Loading:** Quickly load sample product names.
 ## Hosting on Hugging Face Spaces
+1.  Create a free account on [Hugging Face](https://huggingface.co/).
+2.  Go to [Hugging Face Spaces](https://huggingface.co/spaces).
+3.  Click "Create a new Space".
+4.  Select "Streamlit" as the SDK.
+5.  Choose a repository type (usually Git).
+6.  Upload all project files (including the `data` directory with embeddings) to the space repository.
+7.  **Important:** Add your API keys (`VOYAGE_API_KEY`, `OPENAI_API_KEY`, etc.) as **Secrets** in your Hugging Face Space settings. Do *not* commit the `.env` file directly.
+8.  Your app should build and deploy automatically.
 ## Files Included
+-   `app.py`: The main Streamlit application entry point.
+-   `ui.py`: Defines the Streamlit UI layout and components.
+-   `*.py` (various): Backend logic for embeddings, matching, API calls, formatting.
+-   `requirements.txt`: Required Python packages.
+-   `.env`: File to store API keys (add your keys here, **do not commit**).
+-   `run_app.sh`: Example script to run the app locally.
+-   `data/`: Directory containing embedding files.
 ## Requirements
+-   Python 3.8+
+-   API keys for Voyage AI and/or OpenAI (stored in `.env`).
+-   Internet connection for API calls.

app.py CHANGED Viewed

@@ -1,29 +1,50 @@
 import os
 import sys
-import gradio as gr
 from utils import load_embeddings
-from ui import categorize_products, create_demo  # Updated imports
 # Path to the embeddings file
 EMBEDDINGS_PATH = "data/ingredient_embeddings_voyageai.pkl"
-# Check if embeddings file exists
-if not os.path.exists(EMBEDDINGS_PATH):
-    print(f"Error: Embeddings file {EMBEDDINGS_PATH} not found!")
-    print(f"Please ensure the file exists at {os.path.abspath(EMBEDDINGS_PATH)}")
-    sys.exit(1)
-# Load embeddings globally
-try:
-    embeddings_data = load_embeddings(EMBEDDINGS_PATH)
-    # Make embeddings available to the UI functions
-    import ui
-    ui.embeddings = embeddings_data
-except Exception as e:
-    print(f"Error loading embeddings: {e}")
-    sys.exit(1)
-# Launch the Gradio interface
-if __name__ == "__main__":
-    demo = create_demo()
-    demo.launch()

 import os
 import sys
+from dotenv import load_dotenv # Import load_dotenv
+import streamlit as st
+# Load environment variables from .env file at the very beginning
+load_dotenv()
 from utils import load_embeddings
+from ui import render_ui # Import the new Streamlit UI function
+# Set page config as the first Streamlit command
+st.set_page_config(layout="wide", page_title="Product Categorization Tool")
+import ui_core # Import ui_core to set embeddings
 # Path to the embeddings file
 EMBEDDINGS_PATH = "data/ingredient_embeddings_voyageai.pkl"
+# Use Streamlit's caching to load embeddings only once
+@st.cache_data
+def load_all_embeddings(path):
+    """Loads embeddings from the specified path."""
+    if not os.path.exists(path):
+        st.error(f"Error: Embeddings file {path} not found!")
+        st.error(f"Please ensure the file exists at {os.path.abspath(path)}")
+        st.stop() # Stop execution if file not found
+        return None # Return None explicitly, although st.stop() halts
+    try:
+        embeddings_data = load_embeddings(path)
+        return embeddings_data
+    except Exception as e:
+        st.error(f"Error loading embeddings: {e}")
+        st.stop()
+        return None
+# Load embeddings and make them available to UI modules
+embeddings_data = load_all_embeddings(EMBEDDINGS_PATH)
+if embeddings_data:
+    # Pass the loaded embeddings to the ui_core module where other UI modules import it from
+    ui_core.embeddings = embeddings_data
+    # Render the Streamlit UI
+    render_ui()
+else:
+    # This part should ideally not be reached due to st.stop() in load_all_embeddings
+    st.error("Failed to load embeddings. Application cannot start.")
+# Note: No __main__ block needed for Streamlit.
+# Streamlit apps are run using `streamlit run app.py`

comparison.py CHANGED Viewed

@@ -8,7 +8,7 @@ from similarity import hybrid_ingredient_matching
 from api_utils import process_in_parallel, rank_ingredients_openai
 from ui_formatters import format_comparison_html, create_results_container
-from utils import SafeProgress
 from chicory_api import call_chicory_parser
 from embeddings import create_product_embeddings
 from similarity import compute_similarities
@@ -16,7 +16,7 @@ from similarity import compute_similarities
 def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str, Any],
                             embedding_top_n: int = 20, final_top_n: int = 3,
                             confidence_threshold: float = 0.5, match_type="ingredients",
-                            progress=None, expanded_descriptions=None) -> Dict[str, Dict[str, List[Tuple]]]:
     """
     Compare multiple ingredient/category matching methods on the same products
@@ -43,20 +43,21 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
         else:
             print(f"WARNING: First product '{products[0] if products else 'None'}' not found in expanded descriptions")
-    progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
     # Step 1: Generate embeddings for all products (used by multiple methods)
-    progress_tracker(0.1, desc="Generating product embeddings")
     # Use expanded descriptions for embeddings if available
     if expanded_descriptions:
         expanded_product_texts = [expanded_descriptions.get(p, p) for p in products]
-        product_embeddings = create_product_embeddings(expanded_product_texts, progress=progress_tracker,
-                                                      original_products=products)  # Keep original product IDs
     else:
-        product_embeddings = create_product_embeddings(products, progress=progress_tracker)
     # Step 2: Get embedding-based candidates for all products
-    progress_tracker(0.2, desc="Finding embedding candidates")
     similarities = compute_similarities(ingredients_dict, product_embeddings)
     # Filter to top N candidates per product
@@ -65,11 +66,11 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
         embedding_results[product] = product_similarities[:embedding_top_n]
     # Step 3: Process with Chicory Parser
-    progress_tracker(0.3, desc="Running Chicory Parser")
     # Import here to avoid circular imports
     # from chicory_parser import parse_products
-    chicory_results = call_chicory_parser(products, progress=progress_tracker)
     # Initialize result structure
     comparison_results = {}
@@ -103,7 +104,7 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
         comparison_results[product]["chicory"] = chicory_matches
     # Step 4: Process with Voyage AI
-    progress_tracker(0.4, desc="Processing with Voyage AI")
     # Define processing function for Voyage
     def process_voyage(product):
@@ -156,13 +157,17 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
                 # Ensure results are in the expected format
                 formatted_results = []
                 for r in results[:final_top_n]:
                     if isinstance(r, dict) and "name" in r and "score" in r:
                         # Convert score to float to ensure type compatibility
                         try:
                             score = float(r["score"])
                             if score >= confidence_threshold:
-                                formatted_results.append((r["name"], score))
                         except (ValueError, TypeError):
                             print(f"Invalid score format in result: {r}")
                     elif isinstance(r, tuple) and len(r) >= 2:
@@ -177,7 +182,9 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
                                 name = r[0]
                             if score >= confidence_threshold:
-                                formatted_results.append((name, score))
                         except (ValueError, TypeError):
                             print(f"Invalid score format in tuple: {r}")
@@ -197,11 +204,8 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
     voyage_results = process_in_parallel(
         items=products,
         processor_func=process_voyage,
-        max_workers=min(20, len(products)),
-        progress_tracker=progress_tracker,
-        progress_start=0.4,
-        progress_end=0.65,
-        progress_desc="Voyage AI"
     )
     # Update comparison results with Voyage results
@@ -210,7 +214,7 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
             comparison_results[product]["voyage"] = results
     # Step 5: Process with OpenAI
-    progress_tracker(0.7, desc="Running OpenAI processing in parallel")
     # Define processing function for OpenAI
     def process_openai(product):
@@ -261,11 +265,8 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
     openai_results = process_in_parallel(
         items=products,
         processor_func=process_openai,
-        max_workers=min(20, len(products)),
-        progress_tracker=progress_tracker,
-        progress_start=0.7,
-        progress_end=0.95,
-        progress_desc="OpenAI"
     )
     # Update comparison results with OpenAI results
@@ -303,12 +304,12 @@ def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str,
             method_results[method] = formatted_results
-    progress_tracker(1.0, desc="Comparison complete")
     return comparison_results
 def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
                                 final_top_n=3, confidence_threshold=0.5,
-                                match_type="categories", use_expansion=False, progress=None):
     """
     Compare multiple ingredient matching methods on the same products
@@ -324,10 +325,12 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
     Returns:
         HTML formatted comparison results
     """
-    from utils import SafeProgress, load_embeddings
-    progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
-    progress_tracker(0.1, desc="Processing input")
     # Split text input by lines and remove empty lines
     if not product_input:
@@ -338,7 +341,7 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
     # Load appropriate embeddings based on match type
     try:
-        progress_tracker(0.2, desc="Loading embeddings")
         if match_type == "ingredients":
             embeddings_path = "data/ingredient_embeddings_voyageai.pkl"
             embeddings_dict = load_embeddings(embeddings_path)
@@ -355,20 +358,20 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
         # Expand descriptions if requested
         if use_expansion:
             from openai_expansion import expand_product_descriptions
-            progress_tracker(0.25, desc="Expanding product descriptions")
-            expanded_products = expand_product_descriptions(product_names, progress=progress_tracker)
             # Add at beginning of results
             header_text = f"Comparing {len(product_names)} products using multiple {match_type} matching methods WITH expanded descriptions."
-        progress_tracker(0.3, desc="Comparing methods")
         comparison_results = compare_ingredient_methods(
             products=product_names,
             ingredients_dict=embeddings_dict,
             embedding_top_n=embedding_top_n,
             final_top_n=final_top_n,
             confidence_threshold=confidence_threshold,
-            match_type=match_type,
-            progress=progress_tracker,
             expanded_descriptions=expanded_products
         )
     except Exception as e:
@@ -377,7 +380,7 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
         return f"<div style='color: red;'>Error comparing methods: {str(e)}<br><pre>{error_details}</pre></div>"
     # Format results as HTML using centralized formatters
-    progress_tracker(0.9, desc="Formatting results")
     result_elements = []
     for product in product_names:
         if product in comparison_results:
@@ -393,5 +396,5 @@ def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
         header_text=header_text
     )
-    progress_tracker(1.0, desc="Complete")
     return output_html

 from api_utils import process_in_parallel, rank_ingredients_openai
 from ui_formatters import format_comparison_html, create_results_container
+# from utils import SafeProgress # Removed SafeProgress import
 from chicory_api import call_chicory_parser
 from embeddings import create_product_embeddings
 from similarity import compute_similarities
 def compare_ingredient_methods(products: List[str], ingredients_dict: Dict[str, Any],
                             embedding_top_n: int = 20, final_top_n: int = 3,
                             confidence_threshold: float = 0.5, match_type="ingredients",
+                            expanded_descriptions=None) -> Dict[str, Dict[str, List[Tuple]]]: # Removed progress parameter
     """
     Compare multiple ingredient/category matching methods on the same products
         else:
             print(f"WARNING: First product '{products[0] if products else 'None'}' not found in expanded descriptions")
+    # Removed Gradio progress tracking
+    # progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
     # Step 1: Generate embeddings for all products (used by multiple methods)
+    # progress_tracker(0.1, desc="Generating product embeddings") # Removed progress
     # Use expanded descriptions for embeddings if available
     if expanded_descriptions:
         expanded_product_texts = [expanded_descriptions.get(p, p) for p in products]
+        product_embeddings = create_product_embeddings(expanded_product_texts,
+                                                      original_products=products)  # Keep original product IDs, removed progress
     else:
+        product_embeddings = create_product_embeddings(products) # Removed progress
     # Step 2: Get embedding-based candidates for all products
+    # progress_tracker(0.2, desc="Finding embedding candidates") # Removed progress
     similarities = compute_similarities(ingredients_dict, product_embeddings)
     # Filter to top N candidates per product
         embedding_results[product] = product_similarities[:embedding_top_n]
     # Step 3: Process with Chicory Parser
+    # progress_tracker(0.3, desc="Running Chicory Parser") # Removed progress
     # Import here to avoid circular imports
     # from chicory_parser import parse_products
+    chicory_results = call_chicory_parser(products) # Removed progress
     # Initialize result structure
     comparison_results = {}
         comparison_results[product]["chicory"] = chicory_matches
     # Step 4: Process with Voyage AI
+    # progress_tracker(0.4, desc="Processing with Voyage AI") # Removed progress
     # Define processing function for Voyage
     def process_voyage(product):
                 # Ensure results are in the expected format
                 formatted_results = []
+                added_ids = set() # Keep track of added category IDs to avoid duplicates
                 for r in results[:final_top_n]:
                     if isinstance(r, dict) and "name" in r and "score" in r:
                         # Convert score to float to ensure type compatibility
                         try:
                             score = float(r["score"])
+                            name = r["name"] # Extract name for check
                             if score >= confidence_threshold:
+                                if name not in added_ids: # Check for duplicates
+                                    formatted_results.append((name, score))
+                                    added_ids.add(name) # Add ID to set
                         except (ValueError, TypeError):
                             print(f"Invalid score format in result: {r}")
                     elif isinstance(r, tuple) and len(r) >= 2:
                                 name = r[0]
                             if score >= confidence_threshold:
+                                if name not in added_ids: # Check for duplicates
+                                    formatted_results.append((name, score))
+                                    added_ids.add(name) # Add ID to set
                         except (ValueError, TypeError):
                             print(f"Invalid score format in tuple: {r}")
     voyage_results = process_in_parallel(
         items=products,
         processor_func=process_voyage,
+        max_workers=min(20, len(products))
+        # Removed ALL progress tracking arguments
     )
     # Update comparison results with Voyage results
             comparison_results[product]["voyage"] = results
     # Step 5: Process with OpenAI
+    # progress_tracker(0.7, desc="Running OpenAI processing in parallel") # Removed progress
     # Define processing function for OpenAI
     def process_openai(product):
     openai_results = process_in_parallel(
         items=products,
         processor_func=process_openai,
+        max_workers=min(20, len(products))
+        # Removed ALL progress tracking arguments
     )
     # Update comparison results with OpenAI results
             method_results[method] = formatted_results
+    # progress_tracker(1.0, desc="Comparison complete") # Removed progress
     return comparison_results
 def compare_ingredient_methods_ui(product_input, embedding_top_n=20,
                                 final_top_n=3, confidence_threshold=0.5,
+                                match_type="categories", use_expansion=False): # Removed progress parameter
     """
     Compare multiple ingredient matching methods on the same products
     Returns:
         HTML formatted comparison results
     """
+    # from utils import SafeProgress # Removed SafeProgress import
+    from utils import load_embeddings
+    # Removed Gradio progress tracking
+    # progress_tracker = SafeProgress(progress, desc="Comparing matching methods")
+    # progress_tracker(0.1, desc="Processing input")
     # Split text input by lines and remove empty lines
     if not product_input:
     # Load appropriate embeddings based on match type
     try:
+        # progress_tracker(0.2, desc="Loading embeddings") # Removed progress
         if match_type == "ingredients":
             embeddings_path = "data/ingredient_embeddings_voyageai.pkl"
             embeddings_dict = load_embeddings(embeddings_path)
         # Expand descriptions if requested
         if use_expansion:
             from openai_expansion import expand_product_descriptions
+            # progress_tracker(0.25, desc="Expanding product descriptions") # Removed progress
+            expanded_products = expand_product_descriptions(product_names) # Removed progress argument
             # Add at beginning of results
             header_text = f"Comparing {len(product_names)} products using multiple {match_type} matching methods WITH expanded descriptions."
+        # progress_tracker(0.3, desc="Comparing methods") # Removed progress
         comparison_results = compare_ingredient_methods(
             products=product_names,
             ingredients_dict=embeddings_dict,
             embedding_top_n=embedding_top_n,
             final_top_n=final_top_n,
             confidence_threshold=confidence_threshold,
+            match_type=match_type, # Added missing comma
+            # Removed progress argument
             expanded_descriptions=expanded_products
         )
     except Exception as e:
         return f"<div style='color: red;'>Error comparing methods: {str(e)}<br><pre>{error_details}</pre></div>"
     # Format results as HTML using centralized formatters
+    # progress_tracker(0.9, desc="Formatting results") # Removed progress
     result_elements = []
     for product in product_names:
         if product in comparison_results:
         header_text=header_text
     )
+    # progress_tracker(1.0, desc="Complete") # Removed progress
     return output_html

embeddings.py CHANGED Viewed

@@ -6,8 +6,9 @@ import time
 import numpy as np
 from concurrent.futures import ThreadPoolExecutor
-# Set Voyage AI API key directly
-voyageai.api_key = os.getenv("VOYAGE_API_KEY")
 def get_embeddings_batch(texts, model="voyage-3-large", batch_size=100):
     """Get embeddings for a list of texts in batches"""

 import numpy as np
 from concurrent.futures import ThreadPoolExecutor
+# Voyage AI API key is now loaded via environment variable
+# when voyageai.Client() is initialized (after load_dotenv runs)
+# voyageai.api_key = os.getenv("VOYAGE_API_KEY") # Removed global setting
 def get_embeddings_batch(texts, model="voyage-3-large", batch_size=100):
     """Get embeddings for a list of texts in batches"""

requirements.txt CHANGED Viewed

@@ -1,6 +1,8 @@
 voyageai
 numpy
-gradio
 openai
 requests
 tqdm

 voyageai
 numpy
+streamlit
+pandas
 openai
 requests
 tqdm
+python-dotenv

ui.py CHANGED Viewed

@@ -1,222 +1,258 @@
-import gradio as gr
 from comparison import compare_ingredient_methods_ui
-# Import from our UI modules
-from ui_core import embeddings, get_css, load_examples
 from ui_ingredient_matching import categorize_products
 from ui_category_matching import categorize_products_by_category
 from ui_hybrid_matching import categorize_products_with_voyage_reranking
 from ui_expanded_matching import categorize_products_with_openai_reranking
-def create_demo():
-    """Create the Gradio interface"""
-    with gr.Blocks(css=get_css()) as demo:
-        gr.Markdown("# Product Categorization Tool\nAnalyze products by matching to ingredients or categories using AI embeddings.")
-        with gr.Tabs() as tabs:
-            # Original Ingredient Matching Tab
-            with gr.TabItem("Ingredient Embeddings"):
-                with gr.Row():
-                    with gr.Column(scale=1):
-                        # Input section
-                        text_input = gr.Textbox(
-                            lines=10,
-                            placeholder="Enter product names, one per line",
-                            label="Product Names"
-                        )
-                        input_controls = gr.Row()
-                        with input_controls:
-                            use_expansion = gr.Checkbox(
-                                value=False,
-                                label="Use Description Expansion",
-                                info="Expand product descriptions using AI before matching"
-                            )
-                            top_n = gr.Slider(1, 25, 10, step=1, label="Top N Results")
-                            confidence = gr.Slider(0.1, 0.9, 0.5, label="Similarity Threshold")
-                        with gr.Row():
-                            examples_btn = gr.Button("Load Examples", variant="secondary")
-                            categorize_btn = gr.Button("Find Similar Ingredients", variant="primary")
-                    with gr.Column(scale=1):
-                        # Results section
-                        text_output = gr.HTML(label="Similar Ingredients Results", elem_id="results-container")
-            # New Category Matching Tab
-            with gr.TabItem("Category Embeddings"):
-                with gr.Row():
-                    with gr.Column(scale=1):
-                        # Input section
-                        category_text_input = gr.Textbox(
-                            lines=10,
-                            placeholder="Enter product names, one per line",
-                            label="Product Names"
-                        )
-                        category_input_controls = gr.Row()
-                        with category_input_controls:
-                            category_use_expansion = gr.Checkbox(
-                                value=False,
-                                label="Use Description Expansion",
-                                info="Expand product descriptions using AI before matching"
-                            )
-                            category_top_n = gr.Slider(1, 10, 5, step=1, label="Top N Categories")
-                            category_confidence = gr.Slider(0.1, 0.9, 0.5, label="Matching Threshold")
-                        with gr.Row():
-                            category_examples_btn = gr.Button("Load Examples", variant="secondary")
-                            match_categories_btn = gr.Button("Match to Categories", variant="primary")
-                    with gr.Column(scale=1):
-                        # Results section
-                        category_output = gr.HTML(label="Category Matching Results", elem_id="results-container")
-            # Common function to create reranking UI tabs
-            def create_reranking_tab(tab_name, fn_name, default_match="ingredients"):
-                with gr.TabItem(tab_name):
-                    with gr.Row():
-                        with gr.Column(scale=1):
-                            # Input section
-                            tab_input = gr.Textbox(
-                                lines=10,
-                                placeholder="Enter product names, one per line",
-                                label="Product Names"
-                            )
-                            with gr.Row():
-                                tab_expansion = gr.Checkbox(
-                                    value=False,
-                                    label="Use Description Expansion",
-                                    info="Expand product descriptions using AI before matching"
-                                )
-                                tab_emb_top_n = gr.Slider(1, 50, 20, step=1, label="Embedding Top N Results")
-                                tab_top_n = gr.Slider(1, 10, 5, step=1, label="Final Top N Results")
-                                tab_confidence = gr.Slider(0.1, 0.9, 0.5, label="Matching Threshold")
-                            tab_match_type = gr.Radio(
-                                choices=["categories", "ingredients"],
-                                value=default_match,
-                                label="Match Type",
-                                info="Choose whether to match against ingredients or categories"
-                            )
-                            with gr.Row():
-                                tab_examples_btn = gr.Button("Load Examples", variant="secondary")
-                                tab_match_btn = gr.Button(f"Match using {tab_name}", variant="primary")
-                        with gr.Column(scale=1):
-                            # Results section
-                            tab_output = gr.HTML(label=f"{tab_name} Results", elem_id="results-container")
-                    # Connect button events
-                    tab_match_btn.click(
-                        fn=fn_name,
-                        inputs=[tab_input, gr.State(False), tab_expansion, tab_emb_top_n,
-                                tab_top_n, tab_confidence, tab_match_type],
-                        outputs=[tab_output],
                     )
-                    tab_examples_btn.click(
-                        fn=load_examples,
-                        inputs=[],
-                        outputs=tab_input
                     )
-            # Create the reranking tabs using the shared function
-            create_reranking_tab("Voyage AI Reranking", categorize_products_with_voyage_reranking, "categories")
-            create_reranking_tab("OpenAI Reranking", categorize_products_with_openai_reranking, "categories")
-            # New Comparison Tab
-            with gr.TabItem("Compare Methods"):
-                with gr.Row():
-                    with gr.Column():
-                        compare_product_input = gr.Textbox(
-                            label="Enter product names (one per line)",
-                            placeholder="4 Tbsp sweet pickle relish\nchocolate chips\nfresh parsley",
-                            lines=5
-                        )
-                        with gr.Row():
-                            compare_embedding_top_n = gr.Slider(
-                                minimum=5, maximum=50, value=20, step=5,
-                                label="Initial embedding candidates"
-                            )
-                            compare_final_top_n = gr.Slider(
-                                minimum=1, maximum=10, value=3, step=1,
-                                label="Final results per method"
-                            )
-                            compare_confidence_threshold = gr.Slider(
-                                minimum=0.0, maximum=1.0, value=0.5, step=0.05,
-                                label="Confidence threshold"
-                            )
-                        compare_match_type = gr.Radio(
-                            choices=["categories", "ingredients"],
-                            value="categories",
-                            label="Match Type",
-                            info="Choose whether to match against ingredients or categories"
-                        )
-                        # Add expansion checkbox
-                        compare_expansion = gr.Checkbox(
-                            value=False,
-                            label="Use Description Expansion",
-                            info="Expand product descriptions using AI before matching"
                         )
-                        compare_btn = gr.Button("Compare Methods", variant="primary")
-                        compare_examples_btn = gr.Button("Load Examples", variant="secondary")
-                    with gr.Column():
-                        comparison_output = gr.HTML(label="Results", elem_id="results-container")
-                # Connect the compare button
-                compare_btn.click(
-                    fn=compare_ingredient_methods_ui,
-                    inputs=[
-                        compare_product_input,
                         compare_embedding_top_n,
                         compare_final_top_n,
                         compare_confidence_threshold,
                         compare_match_type,
                         compare_expansion
-                    ],
-                    outputs=comparison_output
-                )
-                # Add examples button functionality
-                compare_examples_btn.click(
-                    fn=load_examples,
-                    inputs=[],
-                    outputs=compare_product_input
-                )
-        # Connect buttons for ingredient matching
-        categorize_btn.click(
-            fn=categorize_products,
-            inputs=[text_input, gr.State(False), use_expansion, top_n, confidence],
-            outputs=[text_output],
-        )
-        # Connect buttons for category matching
-        match_categories_btn.click(
-            fn=categorize_products_by_category,
-            inputs=[category_text_input, gr.State(False), category_use_expansion, category_top_n, category_confidence],
-            outputs=[category_output],
-        )
-        # Examples buttons for the first two tabs
-        examples_btn.click(
-            fn=load_examples,
-            inputs=[],
-            outputs=text_input
-        )
-        category_examples_btn.click(
-            fn=load_examples,  # Reuse the same examples
-            inputs=[],
-            outputs=category_text_input
-        )
-        gr.Markdown("Powered by Voyage AI embeddings • Built with Gradio")
-    return demo

+import streamlit as st
+import pandas as pd
 from comparison import compare_ingredient_methods_ui
+from ui_core import embeddings, load_examples
 from ui_ingredient_matching import categorize_products
 from ui_category_matching import categorize_products_by_category
 from ui_hybrid_matching import categorize_products_with_voyage_reranking
 from ui_expanded_matching import categorize_products_with_openai_reranking
+# Removed unused import: from ui_formatters import format_results_html
+# Initialize session state keys if they don't exist
+if 'ingredient_input' not in st.session_state:
+    st.session_state.ingredient_input = ""
+if 'category_input' not in st.session_state:
+    st.session_state.category_input = ""
+if 'voyage_input' not in st.session_state:
+    st.session_state.voyage_input = ""
+if 'openai_input' not in st.session_state:
+    st.session_state.openai_input = ""
+if 'compare_input' not in st.session_state:
+    st.session_state.compare_input = ""
+def render_ui():
+    """Render the Streamlit interface"""
+    # Page config is now set in app.py
+    st.title("Product Categorization Tool")
+    st.markdown("Analyze products by matching to ingredients or categories using AI embeddings.")
+    # Use st.tabs for the different sections
+    tab_ingredient, tab_category, tab_voyage, tab_openai, tab_compare = st.tabs([
+        "Ingredient Embeddings",
+        "Category Embeddings",
+        "Voyage AI Reranking",
+        "OpenAI Reranking",
+        "Compare Methods"
+    ])
+    # --- Ingredient Matching Tab ---
+    with tab_ingredient:
+        st.header("Match Products to Ingredients")
+        col1, col2 = st.columns(2)
+        with col1:
+            # Handle button click *before* rendering the text area
+            if st.button("Load Examples", key="ingredient_examples"):
+                st.session_state.ingredient_input = load_examples() # Update state for next rerun
+            # Input section - Use the session state value
+            text_input = st.text_area(
+                "Product Names (one per line)",
+                value=st.session_state.ingredient_input, # Use value from state
+                placeholder="Enter product names, one per line",
+                height=250,
+                key="ingredient_input_widget" # Use a different key for the widget itself if needed, or manage via value
+            )
+            # Update session state if user types manually
+            st.session_state.ingredient_input = text_input
+            use_expansion = st.checkbox(
+                "Use Description Expansion (AI)",
+                value=False,
+                key="ingredient_expansion",
+                help="Expand product descriptions using AI before matching"
+            )
+            top_n = st.slider("Top N Results", 1, 25, 10, step=1, key="ingredient_top_n")
+            confidence = st.slider("Similarity Threshold", 0.1, 0.9, 0.5, step=0.05, key="ingredient_confidence")
+            find_ingredients_btn = st.button("Find Similar Ingredients", type="primary", key="ingredient_find")
+        with col2:
+            # Results section
+            st.subheader("Results")
+            results_placeholder_ingredient = st.empty()
+            if find_ingredients_btn:
+                if st.session_state.ingredient_input: # Check state value
+                    results_html = categorize_products(
+                        st.session_state.ingredient_input,
+                        False,
+                        use_expansion,
+                        top_n,
+                        confidence
                     )
+                    results_placeholder_ingredient.markdown(results_html, unsafe_allow_html=True)
+                else:
+                    results_placeholder_ingredient.warning("Please enter product names.")
+    # --- Category Matching Tab ---
+    with tab_category:
+        st.header("Match Products to Categories")
+        col1, col2 = st.columns(2)
+        with col1:
+            if st.button("Load Examples", key="category_examples"):
+                st.session_state.category_input = load_examples()
+            category_text_input = st.text_area(
+                "Product Names (one per line)",
+                value=st.session_state.category_input,
+                placeholder="Enter product names, one per line",
+                height=250,
+                key="category_input_widget"
+            )
+            st.session_state.category_input = category_text_input
+            category_use_expansion = st.checkbox(
+                "Use Description Expansion (AI)",
+                value=False,
+                key="category_expansion",
+                help="Expand product descriptions using AI before matching"
+            )
+            category_top_n = st.slider("Top N Categories", 1, 10, 5, step=1, key="category_top_n")
+            category_confidence = st.slider("Matching Threshold", 0.1, 0.9, 0.5, step=0.05, key="category_confidence")
+            match_categories_btn = st.button("Match to Categories", type="primary", key="category_match")
+        with col2:
+            st.subheader("Results")
+            results_placeholder_category = st.empty()
+            if match_categories_btn:
+                if st.session_state.category_input:
+                    results_html = categorize_products_by_category(
+                        st.session_state.category_input,
+                        False,
+                        category_use_expansion,
+                        category_top_n,
+                        category_confidence
                     )
+                    results_placeholder_category.markdown(results_html, unsafe_allow_html=True)
+                else:
+                    results_placeholder_category.warning("Please enter product names.")
+    # --- Common function for Reranking Tabs ---
+    def create_reranking_ui(tab, tab_key_prefix, tab_name, backend_function, default_match="categories"):
+        with tab:
+            st.header(f"Match using {tab_name}")
+            col1, col2 = st.columns(2)
+            with col1:
+                if st.button("Load Examples", key=f"{tab_key_prefix}_examples"):
+                    st.session_state[f"{tab_key_prefix}_input"] = load_examples()
+                tab_input_value = st.text_area(
+                    "Product Names (one per line)",
+                    value=st.session_state[f"{tab_key_prefix}_input"],
+                    placeholder="Enter product names, one per line",
+                    height=250,
+                    key=f"{tab_key_prefix}_input_widget"
+                )
+                st.session_state[f"{tab_key_prefix}_input"] = tab_input_value # Update state
+                tab_expansion = st.checkbox(
+                    "Use Description Expansion (AI)",
+                    value=False,
+                    key=f"{tab_key_prefix}_expansion",
+                    help="Expand product descriptions using AI before matching"
+                )
+                tab_emb_top_n = st.slider("Embedding Top N Results", 1, 50, 20, step=1, key=f"{tab_key_prefix}_emb_top_n")
+                tab_top_n = st.slider("Final Top N Results", 1, 10, 5, step=1, key=f"{tab_key_prefix}_final_top_n")
+                tab_confidence = st.slider("Matching Threshold", 0.1, 0.9, 0.5, step=0.05, key=f"{tab_key_prefix}_confidence")
+                tab_match_type = st.radio(
+                    "Match Type",
+                    options=["categories", "ingredients"],
+                    index=0 if default_match == "categories" else 1,
+                    key=f"{tab_key_prefix}_match_type",
+                    horizontal=True,
+                    help="Choose whether to match against ingredients or categories"
+                )
+                tab_match_btn = st.button(f"Match using {tab_name}", type="primary", key=f"{tab_key_prefix}_match")
+            with col2:
+                st.subheader("Results")
+                results_placeholder_rerank = st.empty()
+                if tab_match_btn:
+                    if st.session_state[f"{tab_key_prefix}_input"]:
+                        results_html = backend_function(
+                            st.session_state[f"{tab_key_prefix}_input"],
+                            False,
+                            tab_expansion,
+                            tab_emb_top_n,
+                            tab_top_n,
+                            tab_confidence,
+                            tab_match_type
                         )
+                        results_placeholder_rerank.markdown(results_html, unsafe_allow_html=True)
+                    else:
+                        results_placeholder_rerank.warning("Please enter product names.")
+    # Create the reranking tabs
+    create_reranking_ui(tab_voyage, "voyage", "Voyage AI Reranking", categorize_products_with_voyage_reranking, "categories")
+    create_reranking_ui(tab_openai, "openai", "OpenAI Reranking", categorize_products_with_openai_reranking, "categories")
+    # --- Compare Methods Tab ---
+    with tab_compare:
+        st.header("Compare Matching Methods")
+        col1, col2 = st.columns(2)
+        with col1:
+            if st.button("Load Examples", key="compare_examples"):
+                st.session_state.compare_input = load_examples()
+            compare_product_input_value = st.text_area(
+                "Product Names (one per line)",
+                value=st.session_state.compare_input,
+                placeholder="4 Tbsp sweet pickle relish\nchocolate chips\nfresh parsley",
+                height=200,
+                key="compare_input_widget"
+            )
+            st.session_state.compare_input = compare_product_input_value # Update state
+            compare_embedding_top_n = st.slider(
+                "Initial embedding candidates",
+                min_value=5, max_value=50, value=20, step=5,
+                key="compare_emb_top_n"
+            )
+            compare_final_top_n = st.slider(
+                "Final results per method",
+                min_value=1, max_value=10, value=3, step=1,
+                key="compare_final_top_n"
+            )
+            compare_confidence_threshold = st.slider(
+                "Confidence threshold",
+                min_value=0.0, max_value=1.0, value=0.5, step=0.05,
+                key="compare_confidence"
+            )
+            compare_match_type = st.radio(
+                "Match Type",
+                options=["categories", "ingredients"],
+                index=0,
+                key="compare_match_type",
+                horizontal=True,
+                help="Choose whether to match against ingredients or categories"
+            )
+            compare_expansion = st.checkbox(
+                "Use Description Expansion (AI)",
+                value=False,
+                key="compare_expansion",
+                help="Expand product descriptions using AI before matching"
+            )
+            compare_btn = st.button("Compare Methods", type="primary", key="compare_run")
+        with col2:
+            st.subheader("Comparison Results")
+            results_placeholder_compare = st.empty()
+            if compare_btn:
+                if st.session_state.compare_input:
+                    results_html = compare_ingredient_methods_ui(
+                        st.session_state.compare_input,
                         compare_embedding_top_n,
                         compare_final_top_n,
                         compare_confidence_threshold,
                         compare_match_type,
                         compare_expansion
+                    )
+                    results_placeholder_compare.markdown(results_html, unsafe_allow_html=True)
+                else:
+                    results_placeholder_compare.warning("Please enter product names.")
+    st.markdown("---")
+    st.markdown("Powered by Voyage AI embeddings • Built with Streamlit")

ui_category_matching.py CHANGED Viewed

@@ -1,5 +1,5 @@
-import gradio as gr
-from utils import SafeProgress
 from category_matching import load_categories, match_products_to_categories
 from ui_core import parse_input
 from ui_formatters import format_categories_html
@@ -8,8 +8,9 @@ from openai_expansion import expand_product_descriptions
 def categorize_products_by_category(product_input, is_file=False, use_expansion=False, top_n=10, confidence_threshold=0.5):
     """Categorize products by matching them to predefined categories"""
-    progress_tracker = SafeProgress(gr.Progress())
-    progress_tracker(0, desc="Starting categorization...")
     # Parse input
     product_names, error = parse_input(product_input, is_file)
@@ -19,15 +20,15 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
     # Optional description expansion
     expanded_descriptions = {}
     if use_expansion:
-        progress_tracker(0.1, desc="Expanding product descriptions...")
-        expanded_descriptions = expand_product_descriptions(product_names, progress=progress_tracker)
         # Use expanded descriptions for matching if available
         products_to_match = [expanded_descriptions.get(p, p) for p in product_names]
     else:
         products_to_match = product_names
     # Load categories
-    progress_tracker(0.2, desc="Loading categories...")
     categories = load_categories()
     # Create a mapping from original product names to expanded versions
@@ -37,13 +38,13 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
             product_to_expanded[product] = products_to_match[i]
     # Match products to categories
-    progress_tracker(0.3, desc="Matching products to categories...")
     match_results = match_products_to_categories(
         products_to_match,
         categories,
         top_n=int(top_n),
-        confidence_threshold=confidence_threshold,
-        progress=progress_tracker
     )
     # Create a new dictionary mapping original product names to their results
@@ -53,7 +54,7 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
             original_product_results[product] = match_results[expanded]
     # Format results
-    progress_tracker(0.9, desc="Formatting results...")
     output_html = "<div style='font-family: Arial, sans-serif; max-width: 100%; overflow-x: auto;'>"
     output_html += f"<p style='color: #555;'>Matched {len(product_names)} products to categories.</p>"
@@ -75,5 +76,5 @@ def categorize_products_by_category(product_input, is_file=False, use_expansion=
     if not match_results:
         output_html = "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>No results found. Please check your input or try different products.</div>"
-    progress_tracker(1.0, desc="Done!")
     return output_html

+# import gradio as gr # Removed Gradio import
+# from utils import SafeProgress # Removed SafeProgress import
 from category_matching import load_categories, match_products_to_categories
 from ui_core import parse_input
 from ui_formatters import format_categories_html
 def categorize_products_by_category(product_input, is_file=False, use_expansion=False, top_n=10, confidence_threshold=0.5):
     """Categorize products by matching them to predefined categories"""
+    # Removed Gradio progress tracking
+    # progress_tracker = SafeProgress(gr.Progress())
+    # progress_tracker(0, desc="Starting categorization...")
     # Parse input
     product_names, error = parse_input(product_input, is_file)
     # Optional description expansion
     expanded_descriptions = {}
     if use_expansion:
+        # progress_tracker(0.1, desc="Expanding product descriptions...") # Removed progress
+        expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
         # Use expanded descriptions for matching if available
         products_to_match = [expanded_descriptions.get(p, p) for p in product_names]
     else:
         products_to_match = product_names
     # Load categories
+    # progress_tracker(0.2, desc="Loading categories...") # Removed progress
     categories = load_categories()
     # Create a mapping from original product names to expanded versions
             product_to_expanded[product] = products_to_match[i]
     # Match products to categories
+    # progress_tracker(0.3, desc="Matching products to categories...") # Removed progress
     match_results = match_products_to_categories(
         products_to_match,
         categories,
         top_n=int(top_n),
+        confidence_threshold=confidence_threshold
+        # Removed progress argument
     )
     # Create a new dictionary mapping original product names to their results
             original_product_results[product] = match_results[expanded]
     # Format results
+    # progress_tracker(0.9, desc="Formatting results...") # Removed progress
     output_html = "<div style='font-family: Arial, sans-serif; max-width: 100%; overflow-x: auto;'>"
     output_html += f"<p style='color: #555;'>Matched {len(product_names)} products to categories.</p>"
     if not match_results:
         output_html = "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>No results found. Please check your input or try different products.</div>"
+    # progress_tracker(1.0, desc="Done!") # Removed progress
     return output_html

ui_expanded_matching.py CHANGED Viewed

@@ -1,5 +1,5 @@
-import gradio as gr
-from utils import SafeProgress
 from embeddings import create_product_embeddings
 from similarity import compute_similarities
 from openai_expansion import expand_product_descriptions
@@ -11,12 +11,13 @@ import json
 def categorize_products_with_openai_reranking(product_input, is_file=False, use_expansion=False,
                                            embedding_top_n=20, top_n=10, confidence_threshold=0.5,
-                                           match_type="ingredients", progress=gr.Progress()):
     """
     Categorize products using OpenAI reranking with optional description expansion
     """
-    progress_tracker = SafeProgress(progress)
-    progress_tracker(0, desc="Starting OpenAI reranking...")
     # Parse input
     product_names, error = parse_input(product_input, is_file)
     if error:
@@ -28,8 +29,8 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
     # Optional description expansion
     expanded_descriptions = {}
     if use_expansion:
-        progress_tracker(0.2, desc="Expanding product descriptions...")
-        expanded_descriptions = expand_product_descriptions(product_names, progress=progress)
     # Get shared OpenAI client
     openai_client = get_openai_client()
@@ -38,13 +39,13 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
     if match_type == "ingredients":
         # Generate product embeddings
-        progress_tracker(0.4, desc="Generating product embeddings...")
         if use_expansion and expanded_descriptions:
             # Use expanded descriptions for embedding creation when available
             products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
             # Map expanded descriptions back to original product names for consistent keys
             product_embeddings = {}
-            temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress)
             # Ensure we use original product names as keys
             for i, product_name in enumerate(product_names):
@@ -52,10 +53,10 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
                     product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
         else:
             # Standard embedding creation with just product names
-            product_embeddings = create_product_embeddings(product_names, progress=progress)
         # Compute embedding similarities for ingredients
-        progress_tracker(0.6, desc="Computing ingredient similarities...")
         all_similarities = compute_similarities(embeddings, product_embeddings)
         print(f"product_names: {product_names}")
@@ -65,7 +66,7 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
         if not all_similarities:
             return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No similarities found. Please try different product names.</div>"
-        progress_tracker(0.7, desc="Re-ranking with OpenAI...")
         # Function for processing each product
         def process_reranking(product):
@@ -104,29 +105,26 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
         final_results = process_in_parallel(
             items=product_names,
             processor_func=process_reranking,
-            max_workers=min(10, len(product_names)),
-            progress_tracker=progress_tracker,
-            progress_start=0.7,
-            progress_end=0.9,
-            progress_desc="Re-ranking"
-        )
     else:  # categories
         # Load category embeddings instead of JSON categories
-        progress_tracker(0.5, desc="Loading category embeddings...")
         category_embeddings = load_category_embeddings()
         if not category_embeddings:
             return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No category embeddings found. Please check that the embeddings file exists at data/category_embeddings.pickle.</div>"
         # Generate product embeddings
-        progress_tracker(0.6, desc="Generating product embeddings...")
         if use_expansion and expanded_descriptions:
             # Use expanded descriptions for embedding creation when available
             products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
             # Map expanded descriptions back to original product names for consistent keys
             product_embeddings = {}
-            temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress)
             # Ensure we use original product names as keys
             for i, product_name in enumerate(product_names):
@@ -134,10 +132,10 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
                     product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
         else:
             # Standard embedding creation with just product names
-            product_embeddings = create_product_embeddings(product_names, progress=progress)
         # Compute embedding similarities for categories
-        progress_tracker(0.7, desc="Computing category similarities...")
         all_similarities = compute_similarities(category_embeddings, product_embeddings)
         if not all_similarities:
@@ -150,7 +148,7 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
                 needed_category_ids.add(category_id)
         # Load only the needed categories from JSON
-        progress_tracker(0.75, desc="Loading category descriptions...")
         category_descriptions = {}
         if needed_category_ids:
             try:
@@ -211,15 +209,12 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
         final_results = process_in_parallel(
             items=product_names,
             processor_func=process_category_matching,
-            max_workers=min(10, len(product_names)),
-            progress_tracker=progress_tracker,
-            progress_start=0.7,
-            progress_end=0.9,
-            progress_desc="Category matching"
-        )
     # Format results
-    progress_tracker(0.9, desc="Formatting results...")
     # Create a list of result dictionaries in consistent format
     formatted_results = []
@@ -259,5 +254,5 @@ def categorize_products_with_openai_reranking(product_input, is_file=False, use_
         confidence_threshold=confidence_threshold  # Pass the threshold to the formatter
     )
-    progress_tracker(1.0, desc="Done!")
     return result_html

+# import gradio as gr # Removed Gradio import
+# from utils import SafeProgress # Removed SafeProgress import
 from embeddings import create_product_embeddings
 from similarity import compute_similarities
 from openai_expansion import expand_product_descriptions
 def categorize_products_with_openai_reranking(product_input, is_file=False, use_expansion=False,
                                            embedding_top_n=20, top_n=10, confidence_threshold=0.5,
+                                           match_type="ingredients"): # Removed progress parameter
     """
     Categorize products using OpenAI reranking with optional description expansion
     """
+    # Removed Gradio progress tracking
+    # progress_tracker = SafeProgress(progress)
+    # progress_tracker(0, desc="Starting OpenAI reranking...")
     # Parse input
     product_names, error = parse_input(product_input, is_file)
     if error:
     # Optional description expansion
     expanded_descriptions = {}
     if use_expansion:
+        # progress_tracker(0.2, desc="Expanding product descriptions...") # Removed progress
+        expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
     # Get shared OpenAI client
     openai_client = get_openai_client()
     if match_type == "ingredients":
         # Generate product embeddings
+        # progress_tracker(0.4, desc="Generating product embeddings...") # Removed progress
         if use_expansion and expanded_descriptions:
             # Use expanded descriptions for embedding creation when available
             products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
             # Map expanded descriptions back to original product names for consistent keys
             product_embeddings = {}
+            temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names
             # Ensure we use original product names as keys
             for i, product_name in enumerate(product_names):
                     product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
         else:
             # Standard embedding creation with just product names
+            product_embeddings = create_product_embeddings(product_names) # Removed progress
         # Compute embedding similarities for ingredients
+        # progress_tracker(0.6, desc="Computing ingredient similarities...") # Removed progress
         all_similarities = compute_similarities(embeddings, product_embeddings)
         print(f"product_names: {product_names}")
         if not all_similarities:
             return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No similarities found. Please try different product names.</div>"
+        # progress_tracker(0.7, desc="Re-ranking with OpenAI...") # Removed progress
         # Function for processing each product
         def process_reranking(product):
         final_results = process_in_parallel(
             items=product_names,
             processor_func=process_reranking,
+            max_workers=min(10, len(product_names)) # Moved max_workers inside
+            # Removed progress tracking arguments
+        ) # Corrected closing parenthesis
     else:  # categories
         # Load category embeddings instead of JSON categories
+        # progress_tracker(0.5, desc="Loading category embeddings...") # Removed progress
         category_embeddings = load_category_embeddings()
         if not category_embeddings:
             return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No category embeddings found. Please check that the embeddings file exists at data/category_embeddings.pickle.</div>"
         # Generate product embeddings
+        # progress_tracker(0.6, desc="Generating product embeddings...") # Removed progress
         if use_expansion and expanded_descriptions:
             # Use expanded descriptions for embedding creation when available
             products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
             # Map expanded descriptions back to original product names for consistent keys
             product_embeddings = {}
+            temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names
             # Ensure we use original product names as keys
             for i, product_name in enumerate(product_names):
                     product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
         else:
             # Standard embedding creation with just product names
+            product_embeddings = create_product_embeddings(product_names) # Removed progress
         # Compute embedding similarities for categories
+        # progress_tracker(0.7, desc="Computing category similarities...") # Removed progress
         all_similarities = compute_similarities(category_embeddings, product_embeddings)
         if not all_similarities:
                 needed_category_ids.add(category_id)
         # Load only the needed categories from JSON
+        # progress_tracker(0.75, desc="Loading category descriptions...") # Removed progress
         category_descriptions = {}
         if needed_category_ids:
             try:
         final_results = process_in_parallel(
             items=product_names,
             processor_func=process_category_matching,
+            max_workers=min(10, len(product_names)) # Restored max_workers inside the call
+            # Removed progress tracking arguments
+        ) # Correctly placed closing parenthesis
     # Format results
+    # progress_tracker(0.9, desc="Formatting results...") # Removed progress
     # Create a list of result dictionaries in consistent format
     formatted_results = []
         confidence_threshold=confidence_threshold  # Pass the threshold to the formatter
     )
+    # progress_tracker(1.0, desc="Done!") # Removed progress
     return result_html

ui_hybrid_matching.py CHANGED Viewed

@@ -1,5 +1,5 @@
-import gradio as gr
-from utils import SafeProgress
 from category_matching import load_categories, hybrid_category_matching
 from similarity import hybrid_ingredient_matching, compute_similarities
 from ui_core import embeddings, parse_input
@@ -9,12 +9,13 @@ from api_utils import get_voyage_client
 def categorize_products_with_voyage_reranking(product_input, is_file=False, use_expansion=False,
                                              embedding_top_n=20, final_top_n=5, confidence_threshold=0.5,
-                                             match_type="categories", progress=gr.Progress()):
     """
     Categorize products using Voyage reranking with optional description expansion
     """
-    progress_tracker = SafeProgress(progress)
-    progress_tracker(0, desc=f"Starting Voyage reranking for {match_type}...")
     # Parse input
     product_names, error = parse_input(product_input, is_file)
@@ -24,24 +25,24 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
     # Optional description expansion
     expanded_descriptions = {}
     if use_expansion:
-        progress_tracker(0.3, desc="Expanding product descriptions...")
-        expanded_descriptions = expand_product_descriptions(product_names, progress=progress)
     match_results = {}
     if match_type == "categories":
         # Load categories
-        progress_tracker(0.2, desc="Loading categories...")
         categories = load_categories()
         # Use hybrid approach for categories with optional expanded descriptions
-        progress_tracker(0.5, desc="Finding and re-ranking categories...")
         match_results = hybrid_category_matching(
             product_names, categories,
             embedding_top_n=int(embedding_top_n),
-            final_top_n=int(final_top_n),
             confidence_threshold=0.0,  # Don't apply threshold here - do it in display
-            expanded_descriptions=expanded_descriptions if use_expansion else None,
-            progress=progress
         )
     else:  # ingredients
         # Validate embeddings are loaded
@@ -49,18 +50,18 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
             return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No ingredient embeddings loaded. Please check that the embeddings file exists and is properly formatted.</div>"
         # Use hybrid approach for ingredients with optional expanded descriptions
-        progress_tracker(0.5, desc="Finding and re-ranking ingredients...")
         match_results = hybrid_ingredient_matching(
             product_names, embeddings,
             embedding_top_n=int(embedding_top_n),
-            final_top_n=int(final_top_n),
             confidence_threshold=0.0,  # Don't apply threshold here - do it in display
-            expanded_descriptions=expanded_descriptions if use_expansion else None,
-            progress=progress
         )
     # Format results
-    progress_tracker(0.9, desc="Formatting results...")
     # Convert to unified format for formatter
     formatted_results = []
@@ -109,7 +110,7 @@ def categorize_products_with_voyage_reranking(product_input, is_file=False, use_
         confidence_threshold=confidence_threshold  # Pass the threshold to the formatter
     )
-    progress_tracker(1.0, desc="Done!")
     return result_html
 # Update the function in ui_hybrid_matching.py
@@ -117,13 +118,14 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
                                      embedding_top_n=20, final_top_n=5,
                                      confidence_threshold=0.5,
                                      expanded_descriptions=None,
-                                     progress=None):
     """Use Voyage AI for reranking instead of OpenAI"""
-    from utils import SafeProgress
     from embeddings import create_product_embeddings
-    progress_tracker = SafeProgress(progress, desc="Voyage ingredient matching")
-    progress_tracker(0.1, desc="Stage 1: Finding candidates with embeddings")
     # Stage 1: Same as before - use embeddings to find candidates
     if expanded_descriptions:
@@ -131,7 +133,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
         products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
         # Map expanded descriptions back to original product names for consistent keys
         product_embeddings = {}
-        temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress_tracker)
         # Ensure we use original product names as keys
         for i, product_name in enumerate(products):
@@ -139,7 +141,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
                 product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
     else:
         # Standard embedding creation with just product names
-        product_embeddings = create_product_embeddings(products, progress=progress_tracker)
     similarities = compute_similarities(ingredients_dict, product_embeddings)
@@ -148,7 +150,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
     for product, product_similarities in similarities.items():
         embedding_results[product] = product_similarities[:embedding_top_n]
-    progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI")
     # Initialize Voyage client
     voyage_client = get_voyage_client()
@@ -157,7 +159,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
     final_results = {}
     for i, product in enumerate(products):
-        progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}")
         if product not in embedding_results or not embedding_results[product]:
             final_results[product] = []
@@ -197,7 +199,7 @@ def hybrid_ingredient_matching_voyage(products, ingredients_dict,
             # Fall back to embedding results
             final_results[product] = candidates[:1]
-    progress_tracker(1.0, desc="Voyage ingredient matching complete")
     return final_results
 # Add this function to ui_hybrid_matching.py
@@ -206,13 +208,14 @@ def hybrid_category_matching_voyage(products, categories_dict,
                                    embedding_top_n=20, final_top_n=5,
                                    confidence_threshold=0.5,
                                    expanded_descriptions=None,
-                                   progress=None):
     """Use Voyage AI for reranking categories instead of OpenAI"""
-    from utils import SafeProgress
     from embeddings import create_product_embeddings
-    progress_tracker = SafeProgress(progress, desc="Voyage category matching")
-    progress_tracker(0.1, desc="Stage 1: Finding candidate categories with embeddings")
     # Stage 1: Same as before - use embeddings to find candidates
     if expanded_descriptions:
@@ -220,7 +223,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
         products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
         # Map expanded descriptions back to original product names for consistent keys
         product_embeddings = {}
-        temp_embeddings = create_product_embeddings(products_for_embedding, progress=progress_tracker)
         # Ensure we use original product names as keys
         for i, product_name in enumerate(products):
@@ -228,7 +231,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
                 product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
     else:
         # Standard embedding creation with just product names
-        product_embeddings = create_product_embeddings(products, progress=progress_tracker)
     from similarity import compute_similarities
     similarities = compute_similarities(categories_dict, product_embeddings)
@@ -238,7 +241,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
     for product, product_similarities in similarities.items():
         embedding_results[product] = product_similarities[:embedding_top_n]
-    progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI")
     # Initialize Voyage client
     voyage_client = get_voyage_client()
@@ -246,7 +249,7 @@ def hybrid_category_matching_voyage(products, categories_dict,
     # Stage 2: Re-rank using Voyage AI
     final_results = {}
     for i, product in enumerate(products):
-        progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}")
         if product not in embedding_results or not embedding_results[product]:
             final_results[product] = []
@@ -286,5 +289,5 @@ def hybrid_category_matching_voyage(products, categories_dict,
             # Fall back to embedding results
             final_results[product] = candidates[:1]
-    progress_tracker(1.0, desc="Voyage category matching complete")
     return final_results

+# import gradio as gr # Removed Gradio import
+# from utils import SafeProgress # Removed SafeProgress import
 from category_matching import load_categories, hybrid_category_matching
 from similarity import hybrid_ingredient_matching, compute_similarities
 from ui_core import embeddings, parse_input
 def categorize_products_with_voyage_reranking(product_input, is_file=False, use_expansion=False,
                                              embedding_top_n=20, final_top_n=5, confidence_threshold=0.5,
+                                             match_type="categories"): # Removed progress parameter
     """
     Categorize products using Voyage reranking with optional description expansion
     """
+    # Removed Gradio progress tracking
+    # progress_tracker = SafeProgress(progress)
+    # progress_tracker(0, desc=f"Starting Voyage reranking for {match_type}...")
     # Parse input
     product_names, error = parse_input(product_input, is_file)
     # Optional description expansion
     expanded_descriptions = {}
     if use_expansion:
+        # progress_tracker(0.3, desc="Expanding product descriptions...") # Removed progress
+        expanded_descriptions = expand_product_descriptions(product_names) # Removed progress argument
     match_results = {}
     if match_type == "categories":
         # Load categories
+        # progress_tracker(0.2, desc="Loading categories...") # Removed progress
         categories = load_categories()
         # Use hybrid approach for categories with optional expanded descriptions
+        # progress_tracker(0.5, desc="Finding and re-ranking categories...") # Removed progress
         match_results = hybrid_category_matching(
             product_names, categories,
             embedding_top_n=int(embedding_top_n),
+            final_top_n=int(final_top_n),
             confidence_threshold=0.0,  # Don't apply threshold here - do it in display
+            expanded_descriptions=expanded_descriptions if use_expansion else None
+            # Removed progress argument
         )
     else:  # ingredients
         # Validate embeddings are loaded
             return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: No ingredient embeddings loaded. Please check that the embeddings file exists and is properly formatted.</div>"
         # Use hybrid approach for ingredients with optional expanded descriptions
+        # progress_tracker(0.5, desc="Finding and re-ranking ingredients...") # Removed progress
         match_results = hybrid_ingredient_matching(
             product_names, embeddings,
             embedding_top_n=int(embedding_top_n),
+            final_top_n=int(final_top_n),
             confidence_threshold=0.0,  # Don't apply threshold here - do it in display
+            expanded_descriptions=expanded_descriptions if use_expansion else None
+            # Removed progress argument
         )
     # Format results
+    # progress_tracker(0.9, desc="Formatting results...") # Removed progress
     # Convert to unified format for formatter
     formatted_results = []
         confidence_threshold=confidence_threshold  # Pass the threshold to the formatter
     )
+    # progress_tracker(1.0, desc="Done!") # Removed progress
     return result_html
 # Update the function in ui_hybrid_matching.py
                                      embedding_top_n=20, final_top_n=5,
                                      confidence_threshold=0.5,
                                      expanded_descriptions=None,
+                                     ): # Removed progress parameter
     """Use Voyage AI for reranking instead of OpenAI"""
+    # from utils import SafeProgress # Removed SafeProgress import
     from embeddings import create_product_embeddings
+    # Removed Gradio progress tracking
+    # progress_tracker = SafeProgress(progress, desc="Voyage ingredient matching")
+    # progress_tracker(0.1, desc="Stage 1: Finding candidates with embeddings")
     # Stage 1: Same as before - use embeddings to find candidates
     if expanded_descriptions:
         products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
         # Map expanded descriptions back to original product names for consistent keys
         product_embeddings = {}
+        temp_embeddings = create_product_embeddings(products_for_embedding, original_products=products) # Removed progress, pass original names
         # Ensure we use original product names as keys
         for i, product_name in enumerate(products):
                 product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
     else:
         # Standard embedding creation with just product names
+        product_embeddings = create_product_embeddings(products) # Removed progress
     similarities = compute_similarities(ingredients_dict, product_embeddings)
     for product, product_similarities in similarities.items():
         embedding_results[product] = product_similarities[:embedding_top_n]
+    # progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI") # Removed progress
     # Initialize Voyage client
     voyage_client = get_voyage_client()
     final_results = {}
     for i, product in enumerate(products):
+        # progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}") # Removed progress
         if product not in embedding_results or not embedding_results[product]:
             final_results[product] = []
             # Fall back to embedding results
             final_results[product] = candidates[:1]
+    # progress_tracker(1.0, desc="Voyage ingredient matching complete") # Removed progress
     return final_results
 # Add this function to ui_hybrid_matching.py
                                    embedding_top_n=20, final_top_n=5,
                                    confidence_threshold=0.5,
                                    expanded_descriptions=None,
+                                   ): # Removed progress parameter
     """Use Voyage AI for reranking categories instead of OpenAI"""
+    # from utils import SafeProgress # Removed SafeProgress import
     from embeddings import create_product_embeddings
+    # Removed Gradio progress tracking
+    # progress_tracker = SafeProgress(progress, desc="Voyage category matching")
+    # progress_tracker(0.1, desc="Stage 1: Finding candidate categories with embeddings")
     # Stage 1: Same as before - use embeddings to find candidates
     if expanded_descriptions:
         products_for_embedding = [expanded_descriptions.get(name, name) for name in products]
         # Map expanded descriptions back to original product names for consistent keys
         product_embeddings = {}
+        temp_embeddings = create_product_embeddings(products_for_embedding, original_products=products) # Removed progress, pass original names
         # Ensure we use original product names as keys
         for i, product_name in enumerate(products):
                 product_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
     else:
         # Standard embedding creation with just product names
+        product_embeddings = create_product_embeddings(products) # Removed progress
     from similarity import compute_similarities
     similarities = compute_similarities(categories_dict, product_embeddings)
     for product, product_similarities in similarities.items():
         embedding_results[product] = product_similarities[:embedding_top_n]
+    # progress_tracker(0.4, desc="Stage 2: Re-ranking with Voyage AI") # Removed progress
     # Initialize Voyage client
     voyage_client = get_voyage_client()
     # Stage 2: Re-rank using Voyage AI
     final_results = {}
     for i, product in enumerate(products):
+        # progress_tracker((0.4 + 0.5 * i / len(products)), desc=f"Re-ranking: {product}") # Removed progress
         if product not in embedding_results or not embedding_results[product]:
             final_results[product] = []
             # Fall back to embedding results
             final_results[product] = candidates[:1]
+    # progress_tracker(1.0, desc="Voyage category matching complete") # Removed progress
     return final_results

ui_ingredient_matching.py CHANGED Viewed

@@ -25,7 +25,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
     expanded_descriptions = {}
     if use_expansion:
         progress_tracker(0.2, desc="Expanding product descriptions...")
-        expanded_descriptions = expand_product_descriptions(product_names, progress=gr.Progress())
     # Create embeddings
     progress_tracker(0.4, desc="Generating product embeddings...")
@@ -34,7 +34,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
         products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
         # Map expanded descriptions back to original product names for consistent keys
         products_embeddings = {}
-        temp_embeddings = create_product_embeddings(products_for_embedding, progress=gr.Progress())
         # Ensure we use original product names as keys
         for i, product_name in enumerate(product_names):
@@ -42,7 +42,7 @@ def categorize_products(product_input, is_file=False, use_expansion=False, top_n
                 products_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
     else:
         # Standard embedding creation with just product names
-        products_embeddings = create_product_embeddings(product_names, progress=gr.Progress())
     if not products_embeddings:
         return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: Failed to generate product embeddings. Please try again with different product names.</div>"

     expanded_descriptions = {}
     if use_expansion:
         progress_tracker(0.2, desc="Expanding product descriptions...")
+        expanded_descriptions = expand_product_descriptions(product_names) # Removed progress
     # Create embeddings
     progress_tracker(0.4, desc="Generating product embeddings...")
         products_for_embedding = [expanded_descriptions.get(name, name) for name in product_names]
         # Map expanded descriptions back to original product names for consistent keys
         products_embeddings = {}
+        temp_embeddings = create_product_embeddings(products_for_embedding, original_products=product_names) # Removed progress, pass original names for keys
         # Ensure we use original product names as keys
         for i, product_name in enumerate(product_names):
                 products_embeddings[product_name] = temp_embeddings[products_for_embedding[i]]
     else:
         # Standard embedding creation with just product names
+        products_embeddings = create_product_embeddings(product_names) # Removed progress
     if not products_embeddings:
         return "<div style='color: #d32f2f; font-weight: bold; padding: 20px;'>Error: Failed to generate product embeddings. Please try again with different product names.</div>"