Spaces:

dev-jas
/

polymer-aging-ml

Sleeping

App Files Files

xet

Community

devjas1 commited on Sep 8

Commit

346c859

1 Parent(s): f7cba14

(CHORE)[Cleanup & Dependency Update]: Refine .gitignore, update requirements, and remove obsolete files

Browse files

- Refactored .gitignore:
- Removed duplicate entries and grouped patterns by category (Python, IDE, notebooks, Streamlit cache, model artifacts, data outputs, office docs).
- Improved maintainability and clarity for future development.
- Updated requirements.txt:
- Added new dependencies for async processing, batch utilities, performance tracking, model optimization, advanced UI, and Hugging Face/Streamlit compatibility (psutil, joblib, tenacity, async-lru, pyarrow, mermaid_cli, etc.).
- Ensured compatibility with recent codebase enhancements.
- Removed obsolete documentation files (CODEBASE_INVENTORY.md, PIPELINE_ANALYSIS_REPORT.md) and deprecated Streamlit pages (Collaborative_Research.py, Educational_Interface.py) to streamline the repository.
- Added __pycache__.py as a placeholder for Python cache directory to support consistent environment setup.

Files changed (7) hide show

.gitignore +108 -16
CODEBASE_INVENTORY.md +0 -435
PIPELINE_ANALYSIS_REPORT.md +0 -1016
__pycache__.py +0 -0
pages/Collaborative_Research.py +0 -700
pages/Educational_Interface.py +0 -405
requirements.txt +17 -1

.gitignore CHANGED Viewed

@@ -1,29 +1,121 @@
-# Ignore raw data and system clutter
-datasets/
 __pycache__/
 *.pyc
 .DS_store
-*.zip
 *.h5
 *.log
 *.env
 *.yml
 *.json
 *.sh
-.streamlit
-outputs/logs/
 docs/PROJECT_REPORT.md
-wea-*.txt
-sta-*.txt
 S3PR.md
-# --- Data (keep folder, ignore files) ---
-datasets/**
-!datasets/.gitkeep
-!datasets/.README.md
-# ---------------------------------------
-__pycache__.py
-outputs/performance_tracking.db

+# =========================
+# General Python & System
+# =========================
 __pycache__/
 *.pyc
+*.pyo
+*.bak
+*.tmp
+*.swp
+*.swo
+*.orig
 .DS_store
+Thumbs.db
+ehthumbs.db
+Desktop.ini
+# =========================
+# IDE & Editor Settings
+# =========================
+.vscode/
+*.code-workspace
+# =========================
+# Jupyter Notebooks
+# =========================
+*.ipynb
+.ipynb_checkpoints/
+# =========================
+# Streamlit Cache & Temp
+# =========================
+.streamlit/
+**/.streamlit/
+**/.streamlit_cache/
+**/.streamlit_temp/
+# =========================
+# Virtual Environments & Build
+# =========================
+venv/
+env/
+.polymer_env/
+*.egg-info/
+dist/
+build/
+# =========================
+# Test & Coverage Outputs
+# =========================
+htmlcov/
+.coverage
+.tox/
+.cache/
+pytest_cache/
+*.cover
+# =========================
+# Data & Outputs
+# =========================
+datasets/
+deferred/
+outputs/logs/
+outputs/performance_tracking.db
+outputs/*.csv
+outputs/*.json
+outputs/*.png
+outputs/*.jpg
+outputs/*.pdf
+# --- Data (keep folder, ignore files) ---
+datasets/**
+!datasets/.gitkeep
+!datasets/.README.md
+# =========================
+# Model Artifacts
+# =========================
+*.pth
+*.pt
+*.ckpt
+*.onnx
 *.h5
+# =========================
+# Miscellaneous Large/Export Files
+# =========================
+*.zip
+*.gz
+*.tar
+*.tar.gz
+*.rar
+*.7z
 *.log
 *.env
 *.yml
 *.json
 *.sh
+*.sqlite3
+*.db
+# =========================
+# Documentation & Reports
+# =========================
 docs/PROJECT_REPORT.md
 S3PR.md
+# =========================
+# Project-specific Data Files
+# =========================
+wea-*.txt
+sta-*.txt
+# =========================
+# Office Documents
+# =========================
+*.xls
+*.xlsx
+*.ppt
+*.pptx
+*.doc
+*.docx

CODEBASE_INVENTORY.md DELETED Viewed

@@ -1,435 +0,0 @@
-# Comprehensive Codebase Audit: Polymer Aging ML Platform
-## Executive Summary
-This audit provides a technical inventory of the dev-jas/polymer-aging-ml repository—a modular machine learning platform for polymer degradation classification using Raman and FTIR spectroscopy. The system features robust error handling, multi-format batch processing, and persistent performance tracking, making it suitable for research, education, and industrial applications.
-## 🏗️ System Architecture
-### Core Infrastructure
-- **Streamlit-based web app** (`app.py`) as the main interface
-- **PyTorch** for deep learning
-- **Docker** for deployment
-- **SQLite** (`outputs/performance_tracking.db`) for performance metrics
-- **Plugin-based model registry** for extensibility
-### Directory Structure
-- **app.py**: Main Streamlit application
-- **README.md**: Project documentation
-- **Dockerfile**: Containerization (Python 3.13-slim)
-- **requirements.txt**: Dependency management
-- **models/**: Neural network architectures and registry
-- **utils/**: Shared utilities (preprocessing, batch, results, performance, errors, confidence)
-- **scripts/**: CLI tools for training, inference, data management
-- **outputs/**: Model weights, inference results, performance DB
-- **sample_data/**: Demo spectrum files
-- **tests/**: Unit tests (PyTest)
-- **datasets/**: Data storage
-- **pages/**: Streamlit dashboard pages
-## 🤖 Machine Learning Framework
-### Model Registry
-Factory pattern in `models/registry.py` enables dynamic model selection:
-```python
-_REGISTRY: Dict[str, Callable[[int], object]] = {
-    "figure2": lambda L: Figure2CNN(input_length=L),
-    "resnet": lambda L: ResNet1D(input_length=L),
-    "resnet18vision": lambda L: ResNet18Vision(input_length=L)
-}
-```
-### Neural Network Architectures
-The platform supports three architectures, offering diverse options for spectral analysis:
-**Figure2CNN (Baseline Model):**
-- Architecture: 4 convolutional layers (1→16→32→64→128), 3 fully connected layers (256→128→2).
-- Performance: 94.80% accuracy, 94.30% F1-score (Raman-only).
-- Parameters: ~500K, supports dynamic input handling.
-**ResNet1D (Advanced Model):**
-- Architecture: 3 residual blocks with 1D skip connections.
-- Performance: 96.20% accuracy, 95.90% F1-score.
-- Parameters: ~100K, efficient via global average pooling.
-**ResNet18Vision (Experimental):**
-- Architecture: 1D-adapted ResNet-18 with 4 layers (2 blocks each).
-- Status: Under evaluation, ~11M parameters.
-- Opportunity: Expand validation for broader spectral applications.
-## 🔧 Data Processing Infrastructure
-### Preprocessing Pipeline
-The system implements a **modular preprocessing pipeline** in `utils/preprocessing.py` with five configurable stages:
-**1. Input Validation Framework:**
-- File format verification (`.txt` files exclusively)
-- Minimum data points validation (≥10 points required)
-- Wavenumber range validation (0-10,000 cm⁻¹ for Raman spectroscopy)
-- Monotonic sequence verification for spectral consistency
-- NaN value detection and automatic rejection
-**2. Core Processing Steps:**
-- **Linear Resampling**: Uniform grid interpolation to 500 points using `scipy.interpolate.interp1d`
-- **Baseline Correction**: Polynomial detrending (configurable degree, default=2)
-- **Savitzky-Golay Smoothing**: Noise reduction (window=11, order=2, configurable)
-- **Min-Max Normalization**: Scaling to range with constant-signal protection
-### Batch Processing Framework
-The `utils/multifile.py` module (12.5 kB) provides **enterprise-grade batch processing** capabilities:
-- **Multi-File Upload**: Streamlit widget supporting simultaneous file selection
-- **Error-Tolerant Processing**: Individual file failures don't interrupt batch operations
-- **Progress Tracking**: Real-time processing status with callback mechanisms
-- **Results Aggregation**: Comprehensive success/failure reporting with export options
-- **Memory Management**: Automatic cleanup between file processing iterations
-## 🖥️ User Interface Architecture
-### Streamlit Application Design
-The main application implements a **sophisticated two-column layout** with comprehensive state management:[^1_2]
-**Left Column - Control Panel:**
-- **Model Selection**: Dropdown with real-time performance metrics display
-- **Input Modes**: Three processing modes (Single Upload, Batch Upload, Sample Data)
-- **Status Indicators**: Color-coded feedback system for user guidance
-- **Form Submission**: Validated input handling with disabled state management
-**Right Column - Results Display:**
-- **Tabbed Interface**: Details, Technical diagnostics, and Scientific explanation
-- **Interactive Visualization**: Confidence progress bars with color coding
-- **Spectrum Analysis**: Side-by-side raw vs. processed spectrum plotting
-- **Technical Diagnostics**: Model metadata, processing times, and debug logs
-### State Management System
-The application employs **advanced session state management**:
-- Persistent state across Streamlit reruns using `st.session_state`
-- Intelligent caching with content-based hash keys for expensive operations
-- Memory cleanup protocols after inference operations
-- Version-controlled file uploader widgets to prevent state conflicts
-## 🛠️ Utility Infrastructure
-### Centralized Error Handling
-The `utils/errors.py` module provides with **context-aware** logging and user-friendly error messages.
-### Performance Tracking System
-The `utils/performance_tracker.py` module provides a robust system for logging and analyzing performance metrics.
-- **Database Logging**: Persists metrics to a SQLite database.
-- **Automated Tracking**: Uses a context manager to automatically track inference time, preprocessing time, and memory usage.
-- **Dashboarding**: Includes functions to generate performance visualizations and summary statistics for the UI.
-### Enhanced Results Management
-The `utils/results_manager.py` module enables comprehensive session and persistent results tracking.
-- **In-Memory Storage**: Manages results for the current session.
-- **Multi-Model Handling**: Aggregates results from multiple models for comparison.
-- **Export Capabilities**: Exports results to CSV and JSON.
-- **Statistical Analysis**: Calculates accuracy, confidence, and other metrics.
-## 📜 Command-Line Interface
-### Training Pipeline
-The `scripts/train_model.py` module (6.27 kB) implements **robust model training**:
-**Cross-Validation Framework:**
-- 10-fold stratified cross-validation for unbiased evaluation
-- Model registry integration supporting all architectures
-- Configurable preprocessing via command-line flags
-- Comprehensive JSON logging with confusion matrices
-**Reproducibility Features:**
-- Fixed random seeds (SEED=42) across all random number generators
-- Deterministic CUDA operations when GPU available
-- Standardized train/validation splitting methodology
-### Data Utilities
-**File Discovery System:**
-- Recursive `.txt` file scanning with label extraction
-- Filename-based labeling convention (`sta-*` = stable, `wea-*` = weathered)
-- Dataset inventory generation with statistical summaries
-### Dependency Management
-The `requirements.txt` specifies **core dependencies without version pinning**:[^1_12]
-- **Web Framework**: `streamlit` for interactive UI
-- **Deep Learning**: `torch`, `torchvision` for model execution
-- **Scientific Computing**: `numpy`, `scipy`, `scikit-learn` for data processing
-- **Visualization**: `matplotlib` for spectrum plotting
-- **API Framework**: `fastapi`, `uvicorn` for potential REST API expansion
-## 🐳 Deployment Infrastructure
-### Docker Configuration
-The Dockerfile uses Python 3.13-slim for efficient containerization:
-- Includes essential build tools and scientific libraries.
-- Supports health checks for container wellness.
-- **Roadmap**: Implement multi-stage builds and environment variables for streamlined deployments.
-### Confidence Analysis System
-The `utils/confidence.py` module provides **scientific confidence metrics**
-**Softmax-Based Confidence:**
-- Normalized probability distributions from model logits
-- Three-tier confidence levels: HIGH (≥80%), MEDIUM (≥60%), LOW (<60%)
-- Color-coded visual indicators with emoji representations
-- Legacy compatibility with logit margin calculations
-### Session Results Management
-The `utils/results_manager.py` module (8.16 kB) enables **comprehensive session tracking**:
-- **In-Memory Storage**: Session-wide results persistence
-- **Export Capabilities**: CSV and JSON download with timestamp formatting
-- **Statistical Analysis**: Automatic accuracy calculation when ground truth available
-- **Data Integrity**: Results survive page refreshes within session boundaries
-## 🧪 Testing Framework
-### Test Infrastructure
-The `tests/` directory implements **basic validation framework**:
-- **PyTest Configuration**: Centralized test settings in `conftest.py`
-- **Preprocessing Tests**: Core pipeline functionality validation in `test_preprocessing.py`
-- **Limited Coverage**: Currently covers preprocessing functions only
-**Testing Coming Soon:**
-- Add model architecture unit tests
-- Integration tests for UI components
-- Performance benchmarking tests
-- Improved error handling validation
-## 🔍 Security \& Quality Assessment
-### Input Validation Security
-**Robust Validation Framework:**
-- Strict file format enforcement preventing arbitrary file uploads
-- Content verification with numeric data type checking
-- Scientific range validation for spectroscopic data integrity
-- Memory safety through automatic cleanup and garbage collection
-### Code Quality Metrics
-**Production Standards:**
-- **Type Safety**: Comprehensive type hints throughout codebase using Python 3.8+ syntax
-- **Documentation**: Inline docstrings following standard conventions
-- **Error Boundaries**: Multi-level exception handling with graceful degradation
-- **Logging**: Structured logging with appropriate severity levels
-## 🚀 Extensibility Analysis
-### Model Architecture Extensibility
-The **registry pattern enables seamless model addition**:
-1. **Implementation**: Create new model class with standardized interface
-2. **Registration**: Add to `models/registry.py` with factory function
-3. **Integration**: Automatic UI and CLI support without code changes
-4. **Validation**: Consistent input/output shape requirements
-### Processing Pipeline Modularity
-**Configurable Architecture:**
-- Boolean flags control individual preprocessing steps
-- Easy integration of new preprocessing techniques
-- Backward compatibility through parameter defaulting
-- Single source of truth in `utils/preprocessing.py`
-### Export \& Integration Capabilities
-**Multi-Format Support:**
-- CSV export for statistical analysis software
-- JSON export for programmatic integration
-- RESTful API potential through FastAPI foundation
-- Batch processing enabling high-throughput scenarios
-## 📊 Performance Characteristics
-### Computational Efficiency
-**Model Performance Metrics:**
-| Model          | Parameters | Accuracy         | F1-Score         | Inference Time   |
-| :------------- | :--------- | :--------------- | :--------------- | :--------------- |
-| Figure2CNN     | ~500K      | 94.80%           | 94.30%           | <1s per spectrum |
-| ResNet1D       | ~100K      | 96.20%           | 95.90%           | <1s per spectrum |
-| ResNet18Vision | ~11M       | Under evaluation | Under evaluation | <2s per spectrum |
-**System Response Times:**
-- Single spectrum processing: <5 seconds end-to-end
-- Batch processing: Linear scaling with file count
-- Model loading: <3 seconds (cached after first load)
-- UI responsiveness: Real-time updates with progress indicators
-### Memory Management
-**Optimization Strategies:**
-- Explicit garbage collection after inference operations[^1_2]
-- CUDA memory cleanup when GPU available
-- Session state pruning for long-running sessions
-- Caching with content-based invalidation
-## 🔮 Strategic Development Roadmap
-The project roadmap has been updated to reflect recent progress:
-- [x] **FTIR Support**: Modular integration of FTIR spectroscopy is complete.
-- [x] **Multi-Model Dashboard**: A model comparison tab has been implemented.
-- [ ] **Image-based Inference**: Future work to include image-based polymer classification.
-- [x] **Performance Tracking**: A performance tracking dashboard has been implemented.
-- [ ] **Enterprise Integration**: Future work to include a RESTful API and more advanced database integration.
-## 💼 Business Logic \& Scientific Workflow
-### Classification Methodology
-**Binary Classification Framework:**
-- **Stable Polymers**: Well-preserved molecular structure suitable for recycling
-- **Weathered Polymers**: Oxidized bonds requiring additional processing
-- **Confidence Thresholds**: Scientific validation with visual indicators
-- **Ground Truth Validation**: Filename-based labeling for accuracy assessment
-### Scientific Applications
-**Research Use Cases:**
-- Material science polymer degradation studies
-- Recycling viability assessment for circular economy
-- Environmental microplastic weathering analysis
-- Quality control in manufacturing processes
-- Longevity prediction for material aging
-### Data Workflow Architecture
-```text
-Input Validation → Spectrum Preprocessing → Model Inference →
-Confidence Analysis → Results Visualization → Export Options
-```
-## 🏁 Audit Conclusion
-This codebase represents a **well-architected, scientifically rigorous machine learning platform** with the following key characteristics:
-**Technical Excellence:**
-- Production-ready architecture with comprehensive error handling
-- Modular design supporting extensibility and maintainability
-- Scientific validation appropriate for spectroscopic data analysis
-- Clean separation between research functionality and production deployment
-**Scientific Rigor:**
-- Proper preprocessing pipeline validated for Raman spectroscopy
-- Multiple model architectures with performance benchmarking
-- Confidence metrics appropriate for scientific decision-making
-- Ground truth validation enabling accuracy assessment
-**Operational Readiness:**
-- Containerized deployment suitable for cloud platforms
-- Batch processing capabilities for high-throughput scenarios
-- Comprehensive export options for downstream analysis
-- Session management supporting extended research workflows
-**Development Quality:**
-- Type-safe Python implementation with modern language features
-- Comprehensive documentation supporting knowledge transfer
-- Modular architecture enabling team development
-- Testing framework foundation for continuous integration
-The platform successfully bridges academic research and practical application, providing both accessible web interface capabilities and automation-friendly command-line tools. The extensible architecture and comprehensive documentation indicate strong software engineering practices suitable for both research institutions and industrial applications.
-**Risk Assessment:** Low - The codebase demonstrates mature engineering practices with appropriate validation and error handling for production deployment.
-**Recommendation:** This platform is ready for production deployment, representing a solid foundation for polymer classification research and industrial applications.
-### EXTRA
-```text
-1. Setup & Configuration (Lines 1-105)
-    Imports: Standard libraries (os, sys, time), data science (numpy, torch, matplotlib), and Streamlit.
-    Local Imports: Pulls from your existing utils and models directories.
-    Constants: Global, hardcoded configuration variables.
-    KEEP_KEYS: Defines which session state keys persist on reset.
-    TARGET_LEN: A static preprocessing value.
-    SAMPLE_DATA_DIR, MODEL_WEIGHTS_DIR: Path configurations.
-    MODEL_CONFIG: A dictionary defining model paths, classes, and metadata.
-    LABEL_MAP: A dictionary for mapping class indices to human-readable names.
-    Page Setup:
-    st.set_page_config(): Sets the browser tab title, icon, and layout.
-    st.markdown(<style>...): A large, embedded multi-line string containing all the custom CSS for the application.
-2. Core Logic & Data Processing (Lines 108-250)
-    Model Handling:
-    load_state_dict(): Cached function to load model weights from a file.
-    load_model(): Cached resource to initialize a model class and load its weights.
-    run_inference(): The main ML prediction function. It takes resampled data, loads the appropriate model, runs inference, and returns the results.
-    Data I/O & Preprocessing:
-    label_file(): Extracts the ground truth label from a filename.
-    get_sample_files(): Lists the available .txt files in the sample data directory.
-    parse_spectrum_data(): The crucial function for reading, validating, and parsing raw text input into numerical numpy arrays.
-    Visualization:
-    create_spectrum_plot(): Generates the "Raw vs. Resampled" matplotlib plot and returns it as an image.
-    Helpers:
-    cleanup_memory(): A utility for garbage collection.
-    get_confidence_description(): Maps a logit margin to a human-readable confidence level.
-3. State Management & Callbacks (Lines 253-335)
-    Initialization:
-    init_session_state(): The cornerstone of the app's state, defining all the default values in st.session_state.
-    Widget Callbacks:
-    on_sample_change(): Triggered when the user selects a sample file.
-    on_input_mode_change(): Triggered by the main st.radio widget.
-    on_model_change(): Triggered when the user selects a new model.
-    Reset/Clear Functions:
-    reset_results(): A soft reset that only clears inference artifacts.
-    reset_ephemeral_state(): The "master reset" that clears almost all session state and forces a file uploader refresh.
-    clear_batch_results(): A focused function to clear only the results in col2.
-4. UI Rendering Components (Lines 338-End)
-    Generic Components:
-    render_kv_grid(): A reusable helper to display a dictionary in a neat grid.
-    render_model_meta(): Renders the model's accuracy and F1 score in the sidebar.
-    Main Application Layout (main()):
-    Sidebar: Contains the header, model selector (st.selectbox), model metadata, and the "About" expander.
-    Column 1 (Input): Contains the main st.radio for mode selection and the conditional logic to display the single file uploader, batch uploader, or sample selector. It also holds the "Run Analysis" and "Reset All" buttons.
-    Column 2 (Results): Contains all the logic for displaying either the batch results or the detailed, tabbed results for a single file (Details, Technical, Explanation).
-```

PIPELINE_ANALYSIS_REPORT.md DELETED Viewed

@@ -1,1016 +0,0 @@
-# ML Pipeline Analysis Report
-## Executive Summary
-This report provides a comprehensive analysis of the machine learning pipeline for polymer degradation classification using Raman and FTIR spectroscopy data. The analysis focuses on codebase structure, data processing, feature extraction, model architecture, and specific UI bugs that impact functionality and user experience.
----
-## Task 1: Codebase Structure Review
-### Overview
-Analyzing the organization, dependencies, and UI integration of the polymer aging ML platform to understand its architecture and identify structural issues.
-### Steps
-#### Step 1: Repository Structure Analysis
-**What**: Examined the overall codebase organization and file structure
-**How**: Explored directory structure, key modules, and dependencies across the entire repository
-**Why**: Understanding the architecture is essential for identifying bottlenecks and areas for improvement
-**Key Findings:**
-- **Modular Architecture**: Well-organized structure with separate modules for UI (`modules/`), models (`models/`), utilities (`utils/`), and preprocessing
-- **Streamlit-based UI**: Single-page application with tabbed interface (Standard Analysis, Model Comparison, Image Analysis, Performance Tracking)
-- **Model Registry System**: Centralized model management in `models/registry.py` with 6 available models
-- **Configuration Split**: Two configuration systems - `config.py` (legacy, 2 models) and `models/registry.py` (current, 6 models)
-#### Step 2: Dependency Analysis
-**What**: Reviewed imports, module relationships, and external dependencies
-**How**: Analyzed import statements, requirements.txt, and cross-module dependencies
-**Why**: Understanding dependencies helps identify potential conflicts and integration issues
-**Key Dependencies:**
-- **Core ML**: PyTorch, scikit-learn, NumPy, SciPy
-- **UI Framework**: Streamlit with custom styling
-- **Data Processing**: Pandas, matplotlib, seaborn for visualization
-- **Spectroscopy**: Custom preprocessing pipeline in `utils/preprocessing.py`
-#### Step 3: UI Integration Assessment
-**What**: Analyzed how UI components integrate with backend logic
-**How**: Examined `modules/ui_components.py`, `app.py`, and state management
-**Why**: UI-backend integration issues are the source of several reported bugs
-**Architecture Pattern:**
-- **Sidebar Controls**: Model selection, modality selection, input configuration
-- **Main Content**: Tabbed interface with distinct workflows
-- **State Management**: Streamlit session state with custom callback system
-- **Results Display**: Modular rendering with caching for performance
-### Task 1 Findings
-**Strengths:**
-- Clean modular architecture with separation of concerns
-- Comprehensive model registry supporting multiple architectures
-- Robust preprocessing pipeline with modality-specific parameters
-- Good error handling and caching mechanisms
-**Critical Issues Identified:**
-1. **Configuration Mismatch**: `config.py` defines only 2 models while `models/registry.py` has 6 models
-2. **UI-Backend Disconnect**: Sidebar uses `MODEL_CONFIG` (2 models) instead of registry (6 models)
-3. **Modality State Inconsistency**: Two separate modality selectors can have different values
-4. **Missing Model Weights**: Model loading expects weight files that may not exist
-### Task 1 Recommendations
-1. **Unify Model Configuration**: Replace `config.py` MODEL_CONFIG with registry-based model selection
-2. **Implement Consistent State Management**: Synchronize modality selection across UI components
-3. **Add Model Availability Checks**: Dynamically show only models with available weights
-4. **Improve Error Handling**: Better user feedback for missing dependencies or models
-### Task 1 Reflection
-The codebase shows good architectural principles but suffers from evolution-related inconsistencies. The split between legacy configuration and new registry system is the root cause of several UI bugs. The modular design makes fixes straightforward once issues are identified.
-### Transition to Next Task
-The structural analysis reveals that preprocessing is well-architected with modality-specific handling. Next, we'll examine the actual preprocessing implementation to assess effectiveness for Raman vs FTIR data.
----
-## Task 2: Data Preprocessing Evaluation
-### Overview
-Evaluating the preprocessing pipeline for both Raman and FTIR spectroscopy data to identify modality-specific issues and optimization opportunities.
-### Steps
-#### Step 1: Preprocessing Pipeline Architecture Analysis
-**What**: Examined the preprocessing pipeline structure and modality handling
-**How**: Analyzed `utils/preprocessing.py` and related test files
-**Why**: Understanding the preprocessing flow is crucial for identifying performance bottlenecks and modality-specific issues
-**Pipeline Components:**
-1. **Input Validation**: File format, data points, wavenumber range validation
-2. **Resampling**: Linear interpolation to uniform 500-point grid
-3. **Baseline Correction**: Polynomial detrending (configurable degree)
-4. **Smoothing**: Savitzky-Golay filter for noise reduction
-5. **Normalization**: Min-max scaling with constant-signal protection
-6. **Modality-Specific Processing**: FTIR atmospheric and water vapor corrections
-#### Step 2: Modality-Specific Parameter Assessment
-**What**: Analyzed the different preprocessing parameters for Raman vs FTIR
-**How**: Examined `MODALITY_PARAMS` and `MODALITY_RANGES` configurations
-**Why**: Different spectroscopy techniques require different preprocessing approaches
-**Raman Parameters:**
-- Range: 200-4000 cm⁻¹ (typical Raman range)
-- Baseline degree: 2 (polynomial)
-- Smoothing window: 11 points
-- Cosmic ray removal: Disabled (potential issue)
-**FTIR Parameters:**
-- Range: 400-4000 cm⁻¹ (FTIR range)
-- Baseline degree: 2 (same as Raman)
-- Smoothing window: Different from Raman
-- Atmospheric correction: Available but optional
-- Water vapor correction: Available but optional
-#### Step 3: Validation and Quality Control Analysis
-**What**: Reviewed data quality assessment and validation mechanisms
-**How**: Examined `modules/enhanced_data_pipeline.py` quality controller
-**Why**: Data quality directly impacts model performance, especially for FTIR
-**Quality Metrics:**
-- Signal-to-noise ratio assessment
-- Baseline stability evaluation
-- Peak resolution analysis
-- Spectral range coverage validation
-- Instrumental artifact detection
-### Task 2 Findings
-**Raman Preprocessing Strengths:**
-- Appropriate wavenumber range for Raman spectroscopy
-- Standard polynomial baseline correction effective for most Raman data
-- Savitzky-Golay smoothing parameters well-tuned
-**Raman Preprocessing Issues:**
-- **Cosmic Ray Removal Disabled**: Major issue for Raman data quality
-- **Fixed Parameters**: No adaptive preprocessing based on signal quality
-- **Limited Noise Handling**: Could benefit from more sophisticated denoising
-**FTIR Preprocessing Strengths:**
-- Modality-specific wavenumber range (400-4000 cm⁻¹)
-- Atmospheric interference correction available
-- Water vapor band correction implemented
-**FTIR Preprocessing Critical Issues:**
-1. **Atmospheric Corrections Often Disabled**: Default configuration doesn't enable critical FTIR corrections
-2. **Insufficient Baseline Correction**: FTIR often requires more aggressive baseline handling
-3. **Limited CO₂/H₂O Handling**: Basic water vapor correction may be insufficient
-4. **No Beer-Lambert Law Considerations**: FTIR absorbance data needs different normalization
-### Task 2 Recommendations
-**For Raman Optimization:**
-1. **Enable Cosmic Ray Removal**: Implement and activate cosmic ray spike detection/removal
-2. **Adaptive Smoothing**: Dynamic smoothing parameters based on noise level
-3. **Advanced Denoising**: Consider wavelet denoising for weak signals
-**For FTIR Enhancement:**
-1. **Enable Atmospheric Corrections by Default**: Activate CO₂ and H₂O corrections
-2. **Improved Baseline Correction**: Implement rubber-band or airPLS baseline correction
-3. **Absorbance-Specific Normalization**: Use Beer-Lambert law appropriate scaling
-4. **Region-of-Interest Selection**: Focus on chemically relevant wavenumber regions
-### Task 2 Reflection
-The preprocessing pipeline is well-architected but conservative in its approach. Raman processing is adequate but misses cosmic ray removal - a critical step. FTIR processing has the right components but they're not properly enabled or optimized. The modular design makes improvements straightforward to implement.
-### Transition to Next Task
-With preprocessing issues identified, we now examine feature extraction methods to understand why FTIR performance is poor compared to Raman and identify optimization opportunities.
----
-## Task 3: Feature Extraction Assessment
-### Overview
-Analyzing feature extraction methods for both modalities, focusing on why FTIR features are ineffective compared to Raman and identifying optimization strategies.
-### Steps
-#### Step 1: Current Feature Extraction Analysis
-**What**: Examined how spectral features are extracted and used by ML models
-**How**: Analyzed model architectures, preprocessing outputs, and feature representation
-**Why**: Feature quality directly impacts model performance and explains modality-specific effectiveness
-**Current Approach:**
-- **Raw Spectral Features**: Direct use of preprocessed intensity values
-- **Uniform Sampling**: All spectra resampled to 500 points regardless of modality
-- **No Domain-Specific Features**: Missing peak detection, band identification, or chemical markers
-- **Generic Architecture**: Same CNN architecture for both Raman and FTIR
-#### Step 2: Raman Feature Effectiveness Analysis
-**What**: Assessed why Raman features work reasonably well
-**How**: Examined Raman spectroscopy characteristics and model performance
-**Why**: Understanding Raman success can guide FTIR improvements
-**Raman Advantages:**
-- **Sharp Peaks**: Raman provides distinct, narrow peaks suitable for CNN pattern recognition
-- **Molecular Vibrations**: Direct correlation between polymer degradation and spectral changes
-- **Less Background**: Raman typically has cleaner backgrounds than FTIR
-- **Consistent Baseline**: Raman baselines are generally more stable
-#### Step 3: FTIR Feature Ineffectiveness Analysis
-**What**: Investigated specific reasons for poor FTIR performance
-**How**: Analyzed FTIR characteristics, preprocessing limitations, and model architecture fit
-**Why**: Identifying root causes enables targeted improvements
-**FTIR Challenges:**
-1. **Broad Absorption Bands**: FTIR features are broader and more overlapping than Raman peaks
-2. **Atmospheric Interference**: CO₂ and H₂O bands mask important polymer signals
-3. **Complex Baselines**: FTIR baselines drift more significantly than Raman
-4. **Beer-Lambert Effects**: Absorbance intensity relates logarithmically to concentration
-5. **Matrix Effects**: Sample preparation artifacts more pronounced in FTIR
-### Task 3 Findings
-**Why FTIR Features Are Ineffective:**
-1. **Inappropriate Preprocessing**:
-   - Min-max normalization ignores Beer-Lambert law principles
-   - Disabled atmospheric corrections leave interfering bands
-   - Insufficient baseline correction for FTIR drift characteristics
-2. **Suboptimal Feature Representation**:
-   - 500-point uniform sampling doesn't emphasize chemically relevant regions
-   - No derivative spectroscopy (essential for FTIR analysis)
-   - Missing peak integration or band ratio calculations
-3. **Architecture Mismatch**:
-   - CNN architectures optimized for sharp Raman peaks
-   - No attention mechanisms for broad FTIR absorption bands
-   - Insufficient receptive field for FTIR's broader spectral features
-4. **Missing Domain Knowledge**:
-   - No chemical group identification (C=O, C-H, O-H bands)
-   - Missing polymer-specific spectral markers
-   - No weathering-related spectral indicators
-**Why Raman Works Better:**
-- Sharp peaks match CNN's pattern recognition strengths
-- More stable baselines require less aggressive preprocessing
-- Direct molecular vibration information
-- Less atmospheric interference
-### Task 3 Recommendations
-**Immediate FTIR Improvements:**
-1. **Enable FTIR-Specific Preprocessing**: Activate atmospheric corrections, improve baseline handling
-2. **Implement Derivative Spectroscopy**: Add first/second derivatives to enhance peak resolution
-3. **Region-of-Interest Focus**: Weight chemically relevant wavenumber regions more heavily
-4. **Absorbance-Appropriate Normalization**: Use log-scale normalization respecting Beer-Lambert law
-**Advanced Feature Engineering:**
-1. **Peak Detection and Integration**: Extract meaningful chemical band areas
-2. **Band Ratio Calculations**: Calculate ratios indicative of polymer degradation
-3. **Spectral Deconvolution**: Separate overlapping absorption bands
-4. **Chemical Group Identification**: Automated detection of polymer functional groups
-**Architecture Modifications:**
-1. **Multi-Scale CNNs**: Different receptive fields for broad vs narrow features
-2. **Attention Mechanisms**: Focus on chemically relevant spectral regions
-3. **Hybrid Models**: Combine CNN backbone with spectroscopy-specific layers
-4. **Ensemble Approaches**: Separate models for different FTIR regions
-### Task 3 Reflection
-The analysis reveals that FTIR's poor performance stems from treating it identically to Raman despite fundamental differences in spectroscopic principles. FTIR requires domain-specific preprocessing, feature extraction, and potentially different architectures. The current generic approach works for Raman's sharp peaks but fails for FTIR's broad bands.
-### Transition to Next Task
-With feature extraction issues identified, we now analyze the ML models and training processes, particularly focusing on how the AI Model Selection UI integrates with the various architectures.
----
-## Task 4: ML Models and Training Analysis
-### Overview
-Evaluating the machine learning models, their architectures, training/validation processes, and integration with the AI Model Selection UI to identify performance and usability issues.
-### Steps
-#### Step 1: Model Architecture Analysis
-**What**: Examined the available model architectures and their suitability for spectroscopy data
-**How**: Analyzed model classes in `models/` directory and registry specifications
-**Why**: Understanding model capabilities helps identify performance limitations and UI integration issues
-**Available Models in Registry (6 total):**
-1. **figure2**: Baseline CNN (500K params, 94.8% accuracy)
-2. **resnet**: ResNet1D with skip connections (100K params, 96.2% accuracy)
-3. **resnet18vision**: Adapted ResNet18 (11M params, 94.5% accuracy)
-4. **enhanced_cnn**: CNN with attention mechanisms (800K params, 97.5% accuracy)
-5. **efficient_cnn**: Lightweight CNN (200K params, 95.5% accuracy)
-6. **hybrid_net**: CNN-Transformer hybrid (1.2M params, 96.8% accuracy)
-**Models in UI Config (2 total):**
-- Only "Figure2CNN (Baseline)" and "ResNet1D (Advanced)" appear in sidebar
-#### Step 2: Training and Validation Process Assessment
-**What**: Analyzed model training methodology and validation approaches
-**How**: Examined training scripts, performance metrics, and validation procedures
-**Why**: Training quality affects model reliability and explains performance differences
-**Training Observations:**
-- **Ground Truth Validation**: Filename-based labeling system (sta* = stable, wea* = weathered)
-- **Performance Tracking**: Comprehensive metrics tracking in `utils/performance_tracker.py`
-- **Cross-Validation**: Framework present but validation rigor unclear
-- **Hyperparameter Tuning**: Model-specific parameters but limited systematic optimization
-#### Step 3: AI Model Selection UI Integration Analysis
-**What**: Investigated how the UI integrates with the model registry and handles model selection
-**How**: Traced code flow from UI components through model loading to inference
-**Why**: UI-backend disconnection is causing major usability issues (Bug A)
-**Integration Flow:**
-1. **Sidebar Selection**: Uses `MODEL_CONFIG` from `config.py` (2 models only)
-2. **Model Loading**: `core_logic.py` expects specific weight file paths
-3. **Registry System**: `models/registry.py` has 6 models but isn't used by UI
-4. **Comparison Tab**: Uses registry correctly, causing inconsistency
-### Task 4 Findings
-**Model Architecture Strengths:**
-- **Diverse Options**: Good variety from lightweight to transformer-based models
-- **Performance Range**: Models span efficiency vs accuracy trade-offs
-- **Modality Support**: All models claim Raman/FTIR compatibility
-- **Modern Architectures**: Includes attention mechanisms and hybrid approaches
-**Critical Integration Issues:**
-1. **Bug A Root Cause - Configuration Split**:
-   - Sidebar uses legacy `config.py` with only 2 models
-   - Registry has 6 models but isn't connected to main UI
-   - Model weights expected in specific paths that may not exist
-2. **Model Loading Problems**:
-   - Weight files may be missing (`model_weights/` or `outputs/` directory)
-   - Error handling shows warnings but continues with random weights
-   - No dynamic availability checking
-3. **Inconsistent Performance Claims**:
-   - Registry shows 97.5% accuracy for enhanced_cnn
-   - Unclear if these are validated metrics or theoretical
-   - No real-time performance validation
-**Training and Validation Issues:**
-1. **Limited Validation Rigor**: Simple filename-based ground truth may be insufficient
-2. **No Cross-Modal Validation**: Models trained/tested on same modality data
-3. **Missing Baseline Comparisons**: No systematic comparison with traditional methods
-4. **Insufficient Hyperparameter Search**: Limited evidence of systematic optimization
-### Task 4 Recommendations
-**Immediate UI Integration Fixes:**
-1. **Connect Registry to Sidebar**: Replace `MODEL_CONFIG` with registry-based selection
-2. **Dynamic Model Availability**: Show only models with available weights
-3. **Unified Model Interface**: Consistent model loading across all UI components
-4. **Better Error Handling**: Clear feedback when models unavailable
-**Model Architecture Improvements:**
-1. **Modality-Specific Models**: Separate architectures optimized for Raman vs FTIR
-2. **Transfer Learning**: Pre-train on one modality, fine-tune on another
-3. **Multi-Modal Models**: Architectures that can handle both modalities simultaneously
-4. **Uncertainty Quantification**: Add confidence estimates to model outputs
-**Training and Validation Enhancements:**
-1. **Rigorous Cross-Validation**: Implement proper k-fold validation
-2. **External Validation**: Test on independent datasets
-3. **Hyperparameter Optimization**: Systematic search for optimal parameters
-4. **Baseline Comparisons**: Compare against traditional chemometric methods
-### Task 4 Reflection
-The model architecture diversity is impressive, but the UI integration is fundamentally broken due to configuration system evolution. The disconnect between registry (6 models) and UI (2 models) creates a poor user experience. Training validation appears adequate but could be more rigorous for scientific applications.
-### Transition to Next Task
-With model integration issues identified, we now investigate the specific UI bugs that impact user experience and functionality, providing detailed analysis of each reported issue.
----
-## Task 5: UI Bug Investigation
-### Overview
-Detailed investigation of the four specific UI bugs reported: AI Model Selection limitations, modality validation issues, Model Comparison tab errors, and conflicting modality selectors.
-### Steps
-#### Step 1: Bug A Analysis - AI Model Selection Limitation
-**What**: Investigated why "Choose AI Model" selectbox shows only 2 models instead of 6
-**How**: Traced code flow from UI rendering to model configuration
-**Why**: This bug prevents users from accessing 4 out of 6 available models
-**Root Cause Analysis:**
-```python
-# In modules/ui_components.py line 197-199
-model_labels = [
-    f"{MODEL_CONFIG[name]['emoji']} {name}" for name in MODEL_CONFIG.keys()
-]
-```
-**Problem**: UI uses `MODEL_CONFIG` from `config.py` which only defines 2 models:
-- "Figure2CNN (Baseline)"
-- "ResNet1D (Advanced)"
-**Missing Models**: 4 models from registry not accessible:
-- enhanced_cnn (97.5% accuracy)
-- efficient_cnn (95.5% accuracy)
-- hybrid_net (96.8% accuracy)
-- resnet18vision (94.5% accuracy)
-#### Step 2: Bug B Analysis - Modality Validation Issues
-**What**: Analyzed why modality selector allows incorrect data processing
-**How**: Examined data validation and routing logic between modality selection and preprocessing
-**Why**: This causes incorrect spectroscopy analysis and invalid results
-**Issue Identification:**
-- **Modality Selection**: Sidebar allows user to choose Raman or FTIR
-- **Data Upload**: User uploads spectrum file (no automatic modality detection)
-- **Processing Gap**: No validation that uploaded data matches selected modality
-- **Result**: FTIR data processed with Raman parameters or vice versa
-**Validation Missing:**
-- No automatic spectroscopy type detection from data characteristics
-- No wavenumber range validation against modality expectations
-- No warning when data doesn't match selected modality
-#### Step 3: Bug C Analysis - Model Comparison Tab Errors
-**What**: Investigated specific errors in Model Comparison tab functionality
-**How**: Analyzed error messages and async processing logic
-**Why**: These errors prevent multi-model comparison functionality
-**Error Analysis:**
-1. **"Error loading model figure2: 'figure2'"**:
-   - Registry uses key "figure2" but UI expects "Figure2CNN (Baseline)"
-   - Model loading function expects config.py format, not registry format
-2. **"Error loading model resnet: 'resnet'"**:
-   - Same issue - key mismatch between registry and loading function
-3. **"Error during comparison: min() arg is an empty sequence"**:
-   - Occurs when no valid model results are available
-   - Async processing fails and leaves empty results list
-   - min() function called on empty list causes crash
-**Async Processing Issues:**
-- Models fail to load due to key mismatch
-- Error handling doesn't prevent downstream crashes
-- UI doesn't gracefully handle all-model-failure scenarios
-#### Step 4: Bug D Analysis - Conflicting Modality Selectors
-**What**: Identified UX issue with two modality selectors having different values
-**How**: Examined state management between sidebar and main content areas
-**Why**: This creates user confusion and inconsistent application behavior
-**Selector Locations:**
-1. **Sidebar**: `st.selectbox("Choose Modality", key="modality_select")`
-2. **Comparison Tab**: `st.selectbox("Select Modality", key="comparison_modality")`
-**State Management Issue:**
-```python
-# In comparison tab - line 1001
-st.session_state["modality_select"] = modality
-```
-- Comparison tab overwrites sidebar state
-- No synchronization mechanism
-- Users can have contradictory settings visible simultaneously
-### Task 5 Findings
-**Bug A - Model Selection (Critical):**
-- **Impact**: 66% of models inaccessible to users
-- **Cause**: Legacy configuration system override
-- **Severity**: High - Major functionality loss
-**Bug B - Modality Validation (High):**
-- **Impact**: Incorrect analysis results, misleading outputs
-- **Cause**: Missing data validation layer
-- **Severity**: High - Scientific accuracy compromised
-**Bug C - Comparison Errors (High):**
-- **Impact**: Multi-model comparison completely broken
-- **Cause**: Key mismatch between registry and loading systems
-- **Severity**: High - Core feature non-functional
-**Bug D - UI Inconsistency (Medium):**
-- **Impact**: User confusion, inconsistent behavior
-- **Cause**: Poor state management across components
-- **Severity**: Medium - UX degradation
-### Task 5 Recommendations
-**Bug A - Immediate Fix:**
-```python
-# Replace MODEL_CONFIG usage with registry
-from models.registry import choices, get_model_info
-# In render_sidebar():
-available_models = choices()
-model_labels = [f"{get_model_info(name).get('emoji', '')} {name}"
-                for name in available_models]
-```
-**Bug B - Data Validation:**
-```python
-def validate_modality_match(x_data, y_data, selected_modality):
-    """Validate that data characteristics match selected modality"""
-    wavenumber_range = max(x_data) - min(x_data)
-    if selected_modality == "raman" and not (200 <= min(x_data) <= 4000):
-        return False, "Data appears to be FTIR, not Raman"
-    elif selected_modality == "ftir" and not (400 <= min(x_data) <= 4000):
-        return False, "Data appears to be Raman, not FTIR"
-    return True, "Modality validated"
-```
-**Bug C - Model Loading Fix:**
-```python
-# Unify model loading to use registry keys consistently
-def load_model_from_registry(model_key):
-    """Load model using registry system"""
-    from models.registry import build, spec
-    model = build(model_key, 500)
-    return model
-```
-**Bug D - State Synchronization:**
-```python
-# Implement centralized modality state
-def sync_modality_state():
-    """Ensure all modality selectors show same value"""
-    if "comparison_modality" in st.session_state:
-        st.session_state["modality_select"] = st.session_state["comparison_modality"]
-```
-### Task 5 Reflection
-All four bugs stem from the evolution of the codebase where new systems (registry) were added without updating dependent components. The fixes are straightforward but require systematic updates across multiple files. The bugs range from critical functionality loss to user experience degradation.
-### Transition to Next Task
-With all bugs identified and root causes understood, we can now propose comprehensive improvements that address not only the immediate issues but also enhance the overall pipeline performance and usability.
----
-## Task 6: Improvement Proposals
-### Overview
-Proposing comprehensive improvements for identified issues, prioritizing FTIR feature enhancements, Raman optimization, and UI bug fixes based on the analysis from Tasks 1-5.
-### Steps
-#### Step 1: Immediate Critical Fixes (High Priority)
-**What**: Address bugs that prevent core functionality
-**How**: Systematic fixes for model selection, modality validation, and UI consistency
-**Why**: These issues block users from accessing key features and compromise result accuracy
-**Priority 1: Model Selection Fix (Bug A)**
-```python
-# File: modules/ui_components.py
-# Replace lines 197-199 with:
-from models.registry import choices, get_model_info
-def render_sidebar():
-    # ... existing code ...
-    # Model selection using registry
-    st.markdown("##### AI Model Selection")
-    available_models = choices()
-    # Check model availability dynamically
-    available_with_weights = []
-    for model_key in available_models:
-        # Check if weights exist
-        model_info = get_model_info(model_key)
-        # Add availability check here
-        available_with_weights.append(model_key)
-    model_options = {name: get_model_info(name) for name in available_with_weights}
-    selected_model = st.selectbox(
-        "Choose AI Model",
-        list(model_options.keys()),
-        key="model_select",
-        format_func=lambda x: f"{model_options[x].get('description', x)}",
-        on_change=on_model_change,
-    )
-```
-**Priority 2: Modality Validation (Bug B)**
-```python
-# File: utils/preprocessing.py
-# Add validation function
-def validate_spectrum_modality(x_data, y_data, selected_modality):
-    """Validate spectrum characteristics match selected modality"""
-    x_min, x_max = min(x_data), max(x_data)
-    validation_rules = {
-        'raman': {
-            'min_wavenumber': 200,
-            'max_wavenumber': 4000,
-            'typical_peaks': 'sharp',
-            'baseline': 'stable'
-        },
-        'ftir': {
-            'min_wavenumber': 400,
-            'max_wavenumber': 4000,
-            'typical_peaks': 'broad',
-            'baseline': 'variable'
-        }
-    }
-    rules = validation_rules[selected_modality]
-    issues = []
-    if x_min < rules['min_wavenumber'] or x_max > rules['max_wavenumber']:
-        issues.append(f"Wavenumber range {x_min:.0f}-{x_max:.0f} cm⁻¹ unusual for {selected_modality.upper()}")
-    return len(issues) == 0, issues
-```
-#### Step 2: FTIR Performance Enhancement (High Priority)
-**What**: Implement FTIR-specific preprocessing and feature extraction improvements
-**How**: Enable atmospheric corrections, add derivative spectroscopy, improve normalization
-**Why**: FTIR currently underperforms due to inappropriate processing for its spectroscopic characteristics
-**Enhanced FTIR Preprocessing:**
-```python
-# File: utils/preprocessing.py
-# Modify MODALITY_PARAMS for FTIR
-MODALITY_PARAMS = {
-    "ftir": {
-        "baseline_degree": 3,  # More aggressive baseline correction
-        "smooth_window": 15,   # Wider smoothing for broad bands
-        "smooth_polyorder": 3,
-        "atmospheric_correction": True,  # Enable by default
-        "water_correction": True,       # Enable by default
-        "derivative_order": 1,          # Add first derivative
-        "normalize_method": "vector",   # L2 normalization better for FTIR
-        "region_weighting": True,       # Weight important chemical regions
-    }
-}
-def apply_ftir_enhancements(x, y):
-    """Enhanced FTIR preprocessing pipeline"""
-    # 1. Remove atmospheric interference
-    y_clean = remove_atmospheric_interference(y)
-    # 2. Advanced baseline correction (airPLS or rubber band)
-    y_baseline = advanced_baseline_correction(y_clean, method='airPLS')
-    # 3. First derivative for peak enhancement
-    y_deriv = np.gradient(y_baseline)
-    # 4. Region-of-interest weighting
-    y_weighted = apply_chemical_region_weighting(x, y_deriv)
-    # 5. Vector normalization
-    y_normalized = y_weighted / np.linalg.norm(y_weighted)
-    return y_normalized
-```
-**FTIR-Specific Model Architecture:**
-```python
-# File: models/ftir_cnn.py
-class FTIRSpecificCNN(nn.Module):
-    """CNN architecture optimized for FTIR characteristics"""
-    def __init__(self, input_length=500):
-        super().__init__()
-        # Multi-scale convolutions for broad absorption bands
-        self.multi_scale_conv = nn.ModuleList([
-            nn.Conv1d(1, 32, kernel_size=3, padding=1),   # Fine features
-            nn.Conv1d(1, 32, kernel_size=7, padding=3),   # Medium features
-            nn.Conv1d(1, 32, kernel_size=15, padding=7),  # Broad features
-        ])
-        # Attention mechanism for chemical region focus
-        self.attention = nn.MultiheadAttention(96, 8)
-        # Chemical group detection layers
-        self.chemical_layers = nn.Sequential(
-            nn.Conv1d(96, 64, kernel_size=5, padding=2),
-            nn.BatchNorm1d(64),
-            nn.ReLU(),
-            nn.Dropout(0.3)
-        )
-        # Classification head
-        self.classifier = nn.Sequential(
-            nn.AdaptiveAvgPool1d(1),
-            nn.Flatten(),
-            nn.Linear(64, 32),
-            nn.ReLU(),
-            nn.Dropout(0.5),
-            nn.Linear(32, 2)
-        )
-    def forward(self, x):
-        # Multi-scale feature extraction
-        scale_features = []
-        for conv in self.multi_scale_conv:
-            scale_features.append(conv(x))
-        # Concatenate multi-scale features
-        features = torch.cat(scale_features, dim=1)
-        # Apply attention
-        features = features.permute(2, 0, 1)  # seq_len, batch, features
-        attended, _ = self.attention(features, features, features)
-        attended = attended.permute(1, 2, 0)  # batch, features, seq_len
-        # Chemical group detection
-        chemical_features = self.chemical_layers(attended)
-        # Classification
-        output = self.classifier(chemical_features)
-        return output
-```
-#### Step 3: Raman Optimization (Medium Priority)
-**What**: Enhance Raman preprocessing and add advanced denoising capabilities
-**How**: Enable cosmic ray removal, adaptive smoothing, and weak signal enhancement
-**Why**: Raman works adequately but has room for optimization, especially for weak signals
-**Raman Enhancements:**
-```python
-# File: utils/raman_enhancement.py
-def enhanced_raman_preprocessing(x, y):
-    """Enhanced Raman preprocessing with cosmic ray removal and adaptive denoising"""
-    # 1. Cosmic ray removal
-    y_clean = remove_cosmic_rays(y, threshold=3.0)
-    # 2. Adaptive smoothing based on signal-to-noise ratio
-    snr = calculate_snr(y_clean)
-    if snr < 10:
-        # Strong smoothing for noisy data
-        y_smooth = savgol_filter(y_clean, window_length=15, polyorder=2)
-    else:
-        # Light smoothing for clean data
-        y_smooth = savgol_filter(y_clean, window_length=7, polyorder=2)
-    # 3. Baseline correction optimized for Raman
-    y_baseline = polynomial_baseline_correction(y_smooth, degree=2)
-    # 4. Peak enhancement for weak signals
-    if snr < 5:
-        y_enhanced = enhance_weak_peaks(y_baseline)
-    else:
-        y_enhanced = y_baseline
-    return y_enhanced
-def remove_cosmic_rays(spectrum, threshold=3.0):
-    """Remove cosmic ray spikes from Raman spectrum"""
-    # Implementation of cosmic ray detection and removal
-    # Using derivative-based spike detection
-    pass
-```
-#### Step 4: UI/UX Improvements (Medium Priority)
-**What**: Fix remaining UI bugs and enhance user experience
-**How**: Implement state synchronization, better error handling, and improved feedback
-**Why**: Good UX is essential for user adoption and prevents analysis errors
-**State Synchronization Fix:**
-```python
-# File: modules/ui_components.py
-def synchronize_modality_state():
-    """Ensure consistent modality selection across all UI components"""
-    # Check if any modality selector changed
-    sidebar_modality = st.session_state.get("modality_select", "raman")
-    comparison_modality = st.session_state.get("comparison_modality", "raman")
-    # Sync states
-    if sidebar_modality != comparison_modality:
-        # Use most recent change
-        if "comparison_modality" in st.session_state:
-            st.session_state["modality_select"] = comparison_modality
-        else:
-            st.session_state["comparison_modality"] = sidebar_modality
-# Call this function at the start of each page render
-```
-**Enhanced Error Handling:**
-```python
-# File: core_logic.py
-def load_model_with_validation(model_name):
-    """Load model with comprehensive validation and user feedback"""
-    try:
-        from models.registry import build, spec, get_model_info
-        # Check if model exists in registry
-        if model_name not in choices():
-            st.error(f"❌ Model '{model_name}' not found in registry")
-            return None, False
-        # Get model info
-        model_info = get_model_info(model_name)
-        # Build model
-        model = build(model_name, 500)
-        # Check for weights
-        weight_path = f"model_weights/{model_name}_model.pth"
-        if os.path.exists(weight_path):
-            state_dict = torch.load(weight_path, map_location="cpu")
-            model.load_state_dict(state_dict)
-            st.success(f"✅ Model '{model_name}' loaded successfully")
-            return model, True
-        else:
-            st.warning(f"⚠️ Weights not found for '{model_name}'. Using random initialization.")
-            return model, False
-    except Exception as e:
-        st.error(f"❌ Error loading model '{model_name}': {str(e)}")
-        return None, False
-```
-#### Step 5: Advanced Improvements (Lower Priority)
-**What**: Implement advanced features for enhanced analysis capabilities
-**How**: Add ensemble methods, uncertainty quantification, and automated quality assessment
-**Why**: These improvements enhance the scientific rigor and usability of the platform
-**Ensemble Modeling:**
-```python
-# File: models/ensemble.py
-class SpectroscopyEnsemble:
-    """Ensemble of models for robust predictions"""
-    def __init__(self, model_names, modality):
-        self.models = {}
-        self.modality = modality
-        for name in model_names:
-            if is_model_compatible(name, modality):
-                self.models[name] = build(name, 500)
-    def predict_with_uncertainty(self, x):
-        """Predict with uncertainty quantification"""
-        predictions = []
-        confidences = []
-        for name, model in self.models.items():
-            pred, conf = model.predict_with_confidence(x)
-            predictions.append(pred)
-            confidences.append(conf)
-        # Ensemble prediction
-        ensemble_pred = np.mean(predictions, axis=0)
-        ensemble_std = np.std(predictions, axis=0)
-        return ensemble_pred, ensemble_std
-```
-### Task 6 Recommendations Summary
-**Immediate Actions (Week 1):**
-1. Fix model selection bug by connecting UI to registry
-2. Implement modality validation for uploaded data
-3. Resolve model comparison tab errors
-4. Synchronize modality selectors across UI
-**FTIR Enhancement (Week 2-3):**
-1. Enable atmospheric and water corrections by default
-2. Implement FTIR-specific preprocessing pipeline
-3. Add derivative spectroscopy capabilities
-4. Create FTIR-optimized model architecture
-**Raman Optimization (Week 3-4):**
-1. Implement cosmic ray removal
-2. Add adaptive preprocessing based on signal quality
-3. Enhance weak signal detection capabilities
-4. Optimize baseline correction parameters
-**Advanced Features (Month 2):**
-1. Implement ensemble modeling with uncertainty quantification
-2. Add automated data quality assessment
-3. Create modality-specific model architectures
-4. Develop comprehensive validation framework
-### Task 6 Reflection
-The proposed improvements address immediate functionality issues while building toward a more robust, scientifically rigorous platform. The modular architecture makes these improvements feasible to implement incrementally. Priority is given to fixes that restore core functionality, followed by scientific accuracy improvements, and finally advanced features for enhanced usability.
-### Final Recommendations
-The ML pipeline shows strong architectural foundations but suffers from evolution-related inconsistencies and inadequate domain-specific optimization. The proposed improvements will restore full functionality, significantly enhance FTIR performance, optimize Raman processing, and improve user experience. Implementation should proceed in priority order to quickly restore core functionality while building toward advanced capabilities.
----
-## Overall Conclusions
-### Critical Issues Summary
-1. **UI-Backend Disconnect**: Model registry not connected to UI (Bug A)
-2. **FTIR Processing Inadequacy**: Generic preprocessing fails for FTIR characteristics
-3. **Missing Data Validation**: No modality-data matching verification (Bug B)
-4. **Inconsistent State Management**: Multiple modality selectors conflict (Bug D)
-5. **Broken Comparison Feature**: Model loading failures prevent comparisons (Bug C)
-### Success Factors
-1. **Strong Architecture**: Modular design supports improvements
-2. **Comprehensive Model Registry**: Good variety of architectures available
-3. **Solid Preprocessing Foundation**: Framework exists, needs optimization
-4. **Quality Tracking**: Performance monitoring infrastructure in place
-### Implementation Priority
-1. **Immediate**: Fix UI bugs to restore functionality
-2. **High**: Enhance FTIR processing for scientific accuracy
-3. **Medium**: Optimize Raman processing and improve UX
-4. **Future**: Add advanced features and ensemble methods
-The analysis reveals a platform with excellent potential held back by integration issues and inadequate domain-specific optimization. The proposed improvements will transform it into a robust, scientifically rigorous tool for polymer degradation analysis.

__pycache__.py ADDED Viewed

File without changes

pages/Collaborative_Research.py DELETED Viewed

@@ -1,700 +0,0 @@
-"""
-Collaborative Research Interface for POLYMEROS
-Community-driven research and validation tools
-"""
-import streamlit as st
-import json
-import numpy as np
-import matplotlib.pyplot as plt
-from datetime import datetime, timedelta
-from typing import Dict, List, Any
-import uuid
-# Import POLYMEROS components
-import sys
-import os
-sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from modules.enhanced_data import KnowledgeGraph, ContextualSpectrum
-def init_collaborative_session():
-    """Initialize collaborative research session"""
-    if "research_projects" not in st.session_state:
-        st.session_state.research_projects = load_demo_projects()
-    if "community_hypotheses" not in st.session_state:
-        st.session_state.community_hypotheses = load_demo_hypotheses()
-    if "user_profile" not in st.session_state:
-        st.session_state.user_profile = {
-            "user_id": "demo_researcher",
-            "name": "Demo Researcher",
-            "expertise_areas": ["polymer_chemistry", "spectroscopy"],
-            "reputation_score": 85,
-            "contributions": 12,
-        }
-def load_demo_projects():
-    """Load demonstration research projects"""
-    return [
-        {
-            "id": "proj_001",
-            "title": "Microplastic Degradation Pathways",
-            "description": "Investigating spectroscopic signatures of microplastic degradation in marine environments",
-            "lead_researcher": "Dr. Sarah Chen",
-            "institution": "Ocean Research Institute",
-            "collaborators": ["University of Tokyo", "MIT Marine Lab"],
-            "status": "active",
-            "created_date": "2024-01-15",
-            "datasets": 3,
-            "participants": 8,
-            "recent_activity": "New FTIR dataset uploaded",
-            "tags": ["microplastics", "marine_degradation", "FTIR"],
-        },
-        {
-            "id": "proj_002",
-            "title": "Biodegradable Polymer Performance",
-            "description": "Comparative study of biodegradable polymer aging under different environmental conditions",
-            "lead_researcher": "Prof. Michael Rodriguez",
-            "institution": "Sustainable Materials Lab",
-            "collaborators": ["Stanford University", "Green Chemistry Institute"],
-            "status": "recruiting",
-            "created_date": "2024-02-20",
-            "datasets": 1,
-            "participants": 3,
-            "recent_activity": "Seeking Raman spectroscopy expertise",
-            "tags": ["biodegradable", "sustainability", "aging"],
-        },
-        {
-            "id": "proj_003",
-            "title": "AI-Assisted Polymer Discovery",
-            "description": "Developing machine learning models for predicting polymer properties from spectroscopic data",
-            "lead_researcher": "Dr. Aisha Patel",
-            "institution": "AI Materials Research Center",
-            "collaborators": ["DeepMind", "Google Research"],
-            "status": "published",
-            "created_date": "2023-11-10",
-            "datasets": 15,
-            "participants": 25,
-            "recent_activity": "Results published in Nature Materials",
-            "tags": ["machine_learning", "property_prediction", "discovery"],
-        },
-    ]
-def load_demo_hypotheses():
-    """Load demonstration community hypotheses"""
-    return [
-        {
-            "id": "hyp_001",
-            "statement": "Carbonyl peak intensity at 1715 cm⁻¹ correlates linearly with UV exposure time in PE samples",
-            "proposer": "Dr. Sarah Chen",
-            "institution": "Ocean Research Institute",
-            "created_date": "2024-03-01",
-            "supporting_evidence": [
-                "Time-series FTIR data from 50 PE samples",
-                "Controlled UV chamber experiments",
-                "Statistical correlation analysis (R² = 0.89)",
-            ],
-            "validation_status": "under_review",
-            "peer_scores": [4.2, 3.8, 4.5, 4.0],
-            "experimental_confirmations": 2,
-            "tags": ["PE", "UV_degradation", "carbonyl"],
-            "discussion_points": 8,
-        },
-        {
-            "id": "hyp_002",
-            "statement": "Machine learning models show systematic bias against weathered polymers with low crystallinity",
-            "proposer": "Prof. Michael Rodriguez",
-            "institution": "Sustainable Materials Lab",
-            "created_date": "2024-02-15",
-            "supporting_evidence": [
-                "Model performance analysis across 1000+ samples",
-                "Crystallinity correlation studies",
-                "Bias detection algorithm results",
-            ],
-            "validation_status": "confirmed",
-            "peer_scores": [4.8, 4.5, 4.7, 4.9],
-            "experimental_confirmations": 5,
-            "tags": ["machine_learning", "bias", "crystallinity"],
-            "discussion_points": 15,
-        },
-    ]
-def render_research_projects():
-    """Render collaborative research projects interface"""
-    st.header("🔬 Collaborative Research Projects")
-    # Project filters
-    col1, col2, col3 = st.columns(3)
-    with col1:
-        status_filter = st.selectbox(
-            "Status:", ["all", "active", "recruiting", "published"]
-        )
-    with col2:
-        tag_filter = st.selectbox(
-            "Domain:", ["all", "microplastics", "biodegradable", "machine_learning"]
-        )
-    with col3:
-        sort_by = st.selectbox("Sort by:", ["recent", "participants", "datasets"])
-    # Filter and sort projects
-    projects = st.session_state.research_projects
-    if status_filter != "all":
-        projects = [p for p in projects if p["status"] == status_filter]
-    if tag_filter != "all":
-        projects = [p for p in projects if tag_filter in p["tags"]]
-    # Display projects
-    for project in projects:
-        with st.expander(f"📋 {project['title']} ({project['status'].title()})"):
-            col1, col2 = st.columns([2, 1])
-            with col1:
-                st.write(f"**Description:** {project['description']}")
-                st.write(
-                    f"**Lead Researcher:** {project['lead_researcher']} ({project['institution']})"
-                )
-                st.write(f"**Collaborators:** {', '.join(project['collaborators'])}")
-                st.write(f"**Tags:** {', '.join(project['tags'])}")
-            with col2:
-                st.metric("Participants", project["participants"])
-                st.metric("Datasets", project["datasets"])
-                st.write(f"**Created:** {project['created_date']}")
-                st.write(f"**Recent:** {project['recent_activity']}")
-            # Action buttons
-            button_col1, button_col2, button_col3 = st.columns(3)
-            with button_col1:
-                if st.button(f"Join Project", key=f"join_{project['id']}"):
-                    st.success("Interest registered! Project lead will be notified.")
-            with button_col2:
-                if st.button(f"View Details", key=f"view_{project['id']}"):
-                    render_project_details(project)
-            with button_col3:
-                if st.button(f"Contact Lead", key=f"contact_{project['id']}"):
-                    st.info("Contact request sent to project lead.")
-    # Create new project
-    st.subheader("➕ Start New Project")
-    with st.expander("Create Research Project"):
-        project_title = st.text_input("Project Title:")
-        project_description = st.text_area("Project Description:")
-        research_areas = st.multiselect(
-            "Research Areas:",
-            [
-                "polymer_chemistry",
-                "spectroscopy",
-                "machine_learning",
-                "sustainability",
-                "degradation",
-            ],
-        )
-        if st.button("Create Project"):
-            if project_title and project_description:
-                new_project = {
-                    "id": f"proj_{len(st.session_state.research_projects) + 1:03d}",
-                    "title": project_title,
-                    "description": project_description,
-                    "lead_researcher": st.session_state.user_profile["name"],
-                    "institution": "User Institution",
-                    "collaborators": [],
-                    "status": "recruiting",
-                    "created_date": datetime.now().strftime("%Y-%m-%d"),
-                    "datasets": 0,
-                    "participants": 1,
-                    "recent_activity": "Project created",
-                    "tags": research_areas,
-                }
-                st.session_state.research_projects.append(new_project)
-                st.success("Project created successfully!")
-            else:
-                st.error("Please fill in required fields.")
-def render_project_details(project):
-    """Render detailed project view"""
-    st.subheader(f"Project Details: {project['title']}")
-    # Project overview
-    col1, col2 = st.columns(2)
-    with col1:
-        st.write(f"**Status:** {project['status'].title()}")
-        st.write(f"**Lead:** {project['lead_researcher']}")
-        st.write(f"**Institution:** {project['institution']}")
-    with col2:
-        st.write(f"**Created:** {project['created_date']}")
-        st.write(f"**Participants:** {project['participants']}")
-        st.write(f"**Datasets:** {project['datasets']}")
-    # Tabs for different project aspects
-    tab1, tab2, tab3, tab4 = st.tabs(
-        ["Overview", "Datasets", "Collaborators", "Timeline"]
-    )
-    with tab1:
-        st.write(project["description"])
-        st.write(f"**Research Areas:** {', '.join(project['tags'])}")
-    with tab2:
-        st.write("**Available Datasets:**")
-        # Mock dataset information
-        datasets = [
-            {
-                "name": "PE_UV_exposure_series",
-                "type": "FTIR",
-                "samples": 150,
-                "uploaded": "2024-03-01",
-            },
-            {
-                "name": "Weathered_samples_marine",
-                "type": "Raman",
-                "samples": 75,
-                "uploaded": "2024-02-15",
-            },
-            {
-                "name": "Control_samples_lab",
-                "type": "FTIR",
-                "samples": 50,
-                "uploaded": "2024-01-20",
-            },
-        ]
-        for dataset in datasets:
-            with st.expander(f"📊 {dataset['name']}"):
-                st.write(f"**Type:** {dataset['type']}")
-                st.write(f"**Samples:** {dataset['samples']}")
-                st.write(f"**Uploaded:** {dataset['uploaded']}")
-                if st.button(f"Access Dataset", key=f"access_{dataset['name']}"):
-                    st.info("Dataset access request submitted.")
-    with tab3:
-        st.write("**Project Collaborators:**")
-        for collab in project["collaborators"]:
-            st.write(f"• {collab}")
-        st.write("**Recent Contributors:**")
-        contributors = [
-            {
-                "name": "Dr. Sarah Chen",
-                "contribution": "FTIR dataset",
-                "date": "2024-03-01",
-            },
-            {
-                "name": "Alex Johnson",
-                "contribution": "Data analysis scripts",
-                "date": "2024-02-28",
-            },
-            {
-                "name": "Prof. Lisa Wang",
-                "contribution": "Methodology review",
-                "date": "2024-02-25",
-            },
-        ]
-        for contrib in contributors:
-            st.write(
-                f"• **{contrib['name']}:** {contrib['contribution']} ({contrib['date']})"
-            )
-    with tab4:
-        st.write("**Project Timeline:**")
-        timeline_events = [
-            {
-                "date": "2024-03-01",
-                "event": "New FTIR dataset uploaded",
-                "type": "data",
-            },
-            {
-                "date": "2024-02-25",
-                "event": "Methodology peer review completed",
-                "type": "review",
-            },
-            {
-                "date": "2024-02-15",
-                "event": "Two new collaborators joined",
-                "type": "team",
-            },
-            {
-                "date": "2024-01-20",
-                "event": "Initial dataset published",
-                "type": "data",
-            },
-            {"date": "2024-01-15", "event": "Project initiated", "type": "milestone"},
-        ]
-        for event in timeline_events:
-            event_icon = {"data": "📊", "review": "🔍", "team": "👥", "milestone": "🎯"}
-            st.write(
-                f"{event_icon.get(event['type'], '📅')} **{event['date']}:** {event['event']}"
-            )
-def render_community_hypotheses():
-    """Render community hypothesis validation interface"""
-    st.header("🧪 Community Hypotheses")
-    # Hypothesis filters
-    col1, col2 = st.columns(2)
-    with col1:
-        status_filter = st.selectbox(
-            "Validation Status:", ["all", "under_review", "confirmed", "rejected"]
-        )
-    with col2:
-        st.selectbox(
-            "Research Domain:",
-            ["all", "degradation", "machine_learning", "characterization"],
-        )
-    # Display hypotheses
-    hypotheses = st.session_state.community_hypotheses
-    for hypothesis in hypotheses:
-        # Calculate average peer score
-        avg_score = np.mean(hypothesis["peer_scores"])
-        with st.expander(
-            f"🧬 {hypothesis['statement'][:80]}... (Score: {avg_score:.1f}/5)"
-        ):
-            col1, col2 = st.columns([2, 1])
-            with col1:
-                st.write(f"**Full Statement:** {hypothesis['statement']}")
-                st.write(
-                    f"**Proposer:** {hypothesis['proposer']} ({hypothesis['institution']})"
-                )
-                st.write(f"**Status:** {hypothesis['validation_status'].title()}")
-                st.write("**Supporting Evidence:**")
-                for evidence in hypothesis["supporting_evidence"]:
-                    st.write(f"• {evidence}")
-            with col2:
-                st.metric("Peer Score", f"{avg_score:.1f}/5")
-                st.metric("Confirmations", hypothesis["experimental_confirmations"])
-                st.metric("Discussions", hypothesis["discussion_points"])
-                st.write(f"**Proposed:** {hypothesis['created_date']}")
-            # Peer review section
-            st.subheader("Peer Review")
-            review_col1, review_col2 = st.columns(2)
-            with review_col1:
-                user_score = st.slider(
-                    "Your Score:", 1, 5, 3, key=f"score_{hypothesis['id']}"
-                )
-            with review_col2:
-                if st.button("Submit Review", key=f"review_{hypothesis['id']}"):
-                    hypothesis["peer_scores"].append(user_score)
-                    st.success("Review submitted!")
-            # Comments and discussion
-            st.subheader("Community Discussion")
-            # Mock discussion
-            discussions = [
-                {
-                    "author": "Dr. Sarah Chen",
-                    "comment": "Interesting correlation! Would like to see this tested with PP samples.",
-                    "date": "2024-03-02",
-                },
-                {
-                    "author": "Prof. Wang",
-                    "comment": "The R² value is impressive. Have you controlled for temperature effects?",
-                    "date": "2024-03-01",
-                },
-                {
-                    "author": "Alex Johnson",
-                    "comment": "We're seeing similar patterns in our lab. Happy to collaborate on validation.",
-                    "date": "2024-02-28",
-                },
-            ]
-            for discussion in discussions:
-                st.write(
-                    f"**{discussion['author']}** ({discussion['date']}): {discussion['comment']}"
-                )
-            # Add comment
-            new_comment = st.text_area(
-                "Add your comment:", key=f"comment_{hypothesis['id']}"
-            )
-            if st.button("Post Comment", key=f"post_{hypothesis['id']}"):
-                if new_comment:
-                    st.success("Comment posted!")
-                else:
-                    st.error("Please enter a comment.")
-    # Submit new hypothesis
-    st.subheader("➕ Propose New Hypothesis")
-    with st.expander("Submit Hypothesis"):
-        hyp_statement = st.text_area("Hypothesis Statement:")
-        hyp_evidence = st.text_area("Supporting Evidence (one per line):")
-        hyp_tags = st.multiselect(
-            "Research Tags:",
-            [
-                "degradation",
-                "machine_learning",
-                "spectroscopy",
-                "characterization",
-                "prediction",
-            ],
-        )
-        if st.button("Submit Hypothesis"):
-            if hyp_statement and hyp_evidence:
-                evidence_list = [
-                    e.strip() for e in hyp_evidence.split("\n") if e.strip()
-                ]
-                new_hypothesis = {
-                    "id": f"hyp_{len(st.session_state.community_hypotheses) + 1:03d}",
-                    "statement": hyp_statement,
-                    "proposer": st.session_state.user_profile["name"],
-                    "institution": "User Institution",
-                    "created_date": datetime.now().strftime("%Y-%m-%d"),
-                    "supporting_evidence": evidence_list,
-                    "validation_status": "under_review",
-                    "peer_scores": [],
-                    "experimental_confirmations": 0,
-                    "tags": hyp_tags,
-                    "discussion_points": 0,
-                }
-                st.session_state.community_hypotheses.append(new_hypothesis)
-                st.success("Hypothesis submitted for peer review!")
-            else:
-                st.error("Please provide hypothesis statement and evidence.")
-def render_peer_review_system():
-    """Render peer review and reputation system"""
-    st.header("👥 Peer Review System")
-    user_profile = st.session_state.user_profile
-    # User reputation dashboard
-    st.subheader("Your Research Profile")
-    col1, col2, col3, col4 = st.columns(4)
-    with col1:
-        st.metric("Reputation Score", user_profile["reputation_score"])
-    with col2:
-        st.metric("Contributions", user_profile["contributions"])
-    with col3:
-        st.metric("Expertise Areas", len(user_profile["expertise_areas"]))
-    with col4:
-        st.metric("Active Reviews", 3)  # Mock data
-    # Expertise areas
-    st.subheader("Research Expertise")
-    current_expertise = user_profile["expertise_areas"]
-    all_expertise = [
-        "polymer_chemistry",
-        "spectroscopy",
-        "machine_learning",
-        "materials_science",
-        "degradation_mechanisms",
-        "sustainability",
-    ]
-    new_expertise = st.multiselect(
-        "Update your expertise areas:", all_expertise, default=current_expertise
-    )
-    if new_expertise != current_expertise:
-        user_profile["expertise_areas"] = new_expertise
-        st.success("Expertise areas updated!")
-    # Pending reviews
-    st.subheader("Pending Reviews")
-    pending_reviews = [
-        {
-            "type": "hypothesis",
-            "title": "Spectral band shifts indicate polymer chain scission",
-            "author": "Dr. James Smith",
-            "deadline": "2024-03-10",
-            "complexity": "medium",
-        },
-        {
-            "type": "dataset",
-            "title": "UV-degraded PP sample collection",
-            "author": "Prof. Lisa Wang",
-            "deadline": "2024-03-15",
-            "complexity": "low",
-        },
-    ]
-    for review in pending_reviews:
-        with st.expander(f"📋 {review['title']} (Due: {review['deadline']})"):
-            st.write(f"**Type:** {review['type'].title()}")
-            st.write(f"**Author:** {review['author']}")
-            st.write(f"**Complexity:** {review['complexity'].title()}")
-            st.write(f"**Deadline:** {review['deadline']}")
-            if st.button("Start Review", key=f"start_{review['title'][:20]}"):
-                st.info("Review interface would open here.")
-    # Review quality metrics
-    st.subheader("Review Quality Metrics")
-    metrics = {
-        "Average Review Time": "2.3 days",
-        "Review Accuracy": "94%",
-        "Helpfulness Score": "4.7/5",
-        "Reviews Completed": "28",
-    }
-    metric_cols = st.columns(len(metrics))
-    for i, (metric, value) in enumerate(metrics.items()):
-        with metric_cols[i]:
-            st.metric(metric, value)
-def render_knowledge_sharing():
-    """Render knowledge sharing and collaboration tools"""
-    st.header("📚 Knowledge Sharing Hub")
-    # Recent contributions
-    st.subheader("Recent Community Contributions")
-    contributions = [
-        {
-            "type": "dataset",
-            "title": "Marine microplastic spectral library",
-            "contributor": "Dr. Sarah Chen",
-            "date": "2024-03-05",
-            "downloads": 47,
-            "rating": 4.8,
-        },
-        {
-            "type": "analysis_script",
-            "title": "Automated peak identification algorithm",
-            "contributor": "Alex Johnson",
-            "date": "2024-03-03",
-            "downloads": 23,
-            "rating": 4.6,
-        },
-        {
-            "type": "methodology",
-            "title": "Best practices for sample preparation",
-            "contributor": "Prof. Michael Rodriguez",
-            "date": "2024-03-01",
-            "downloads": 156,
-            "rating": 4.9,
-        },
-    ]
-    for contrib in contributions:
-        with st.expander(f"📊 {contrib['title']} by {contrib['contributor']}"):
-            col1, col2 = st.columns([2, 1])
-            with col1:
-                st.write(f"**Type:** {contrib['type'].replace('_', ' ').title()}")
-                st.write(f"**Contributor:** {contrib['contributor']}")
-                st.write(f"**Date:** {contrib['date']}")
-            with col2:
-                st.metric("Downloads", contrib["downloads"])
-                st.metric("Rating", f"{contrib['rating']}/5")
-            if st.button("Access Resource", key=f"access_{contrib['title'][:20]}"):
-                st.success("Resource access granted!")
-    # Upload new resource
-    st.subheader("➕ Share Knowledge Resource")
-    with st.expander("Upload Resource"):
-        resource_type = st.selectbox(
-            "Resource Type:", ["dataset", "analysis_script", "methodology"]
-        )
-        resource_title = st.text_input("Resource Title:")
-        resource_description = st.text_area("Description:")
-        resource_tags = st.multiselect(
-            "Tags:",
-            [
-                "spectroscopy",
-                "polymer_aging",
-                "machine_learning",
-                "data_analysis",
-                "methodology",
-            ],
-        )
-        uploaded_file = st.file_uploader("Upload File:")
-        if st.button("Share Resource"):
-            if (
-                resource_title
-                and resource_description
-                and resource_tags
-                and uploaded_file
-            ):
-                st.success(
-                    f"Resource of type '{resource_type}' uploaded and shared with the community!"
-                )
-            else:
-                st.error("Please fill in all required fields.")
-def main():
-    """Main collaborative research interface"""
-    st.set_page_config(
-        page_title="POLYMEROS Collaborative Research", page_icon="👥", layout="wide"
-    )
-    st.title("👥 POLYMEROS Collaborative Research")
-    st.markdown("**Community-Driven Research and Validation Platform**")
-    # Initialize session
-    init_collaborative_session()
-    # Sidebar navigation
-    st.sidebar.title("🤝 Collaboration Tools")
-    page = st.sidebar.selectbox(
-        "Select tool:",
-        [
-            "Research Projects",
-            "Community Hypotheses",
-            "Peer Review System",
-            "Knowledge Sharing",
-        ],
-    )
-    # Display user profile in sidebar
-    st.sidebar.markdown("---")
-    st.sidebar.markdown("**Your Profile**")
-    profile = st.session_state.user_profile
-    st.sidebar.write(f"**Name:** {profile['name']}")
-    st.sidebar.write(f"**Reputation:** {profile['reputation_score']}")
-    st.sidebar.write(f"**Contributions:** {profile['contributions']}")
-    # Render selected page
-    if page == "Research Projects":
-        render_research_projects()
-    elif page == "Community Hypotheses":
-        render_community_hypotheses()
-    elif page == "Peer Review System":
-        render_peer_review_system()
-    elif page == "Knowledge Sharing":
-        render_knowledge_sharing()
-    # Footer
-    st.sidebar.markdown("---")
-    st.sidebar.markdown("**POLYMEROS Community**")
-    st.sidebar.markdown("*Advancing polymer science together*")
-if __name__ == "__main__":
-    main()

pages/Educational_Interface.py DELETED Viewed

@@ -1,405 +0,0 @@
-"""
-Educational Interface Page for POLYMEROS
-Interactive learning system with adaptive progression and virtual laboratory
-"""
-import streamlit as st
-import numpy as np
-import matplotlib.pyplot as plt
-import json
-from typing import Dict, List, Any
-# Import POLYMEROS educational components
-import sys
-import os
-sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), "modules"))
-from modules.educational_framework import EducationalFramework
-def init_educational_session():
-    """Initialize educational session state"""
-    if "educational_framework" not in st.session_state:
-        st.session_state.educational_framework = EducationalFramework()
-    if "current_user_id" not in st.session_state:
-        st.session_state.current_user_id = "demo_user"
-    if "user_progress" not in st.session_state:
-        st.session_state.user_progress = (
-            st.session_state.educational_framework.initialize_user(
-                st.session_state.current_user_id
-            )
-        )
-def render_competency_assessment():
-    """Render interactive competency assessment"""
-    st.header("🧪 Knowledge Assessment")
-    domains = ["spectroscopy_basics", "polymer_aging", "ai_ml_concepts"]
-    selected_domain = st.selectbox(
-        "Select assessment domain:",
-        domains,
-        format_func=lambda x: x.replace("_", " ").title(),
-    )
-    framework = st.session_state.educational_framework
-    assessor = framework.competency_assessor
-    if selected_domain in assessor.assessment_tasks:
-        tasks = assessor.assessment_tasks[selected_domain]
-        st.subheader(f"Assessment: {selected_domain.replace('_', ' ').title()}")
-        responses = []
-        for i, task in enumerate(tasks):
-            st.write(f"**Question {i+1}:** {task['question']}")
-            response = st.radio(
-                f"Select answer for question {i+1}:",
-                options=range(len(task["options"])),
-                format_func=lambda x, task=task: task["options"][x],
-                key=f"q_{selected_domain}_{i}",
-                index=0,
-            )
-            responses.append(response)
-        if st.button("Submit Assessment", key=f"submit_{selected_domain}"):
-            results = framework.assess_user_competency(selected_domain, responses)
-            st.success(f"Assessment completed! Score: {results['score']:.1%}")
-            st.write(f"**Your level:** {results['level']}")
-            st.subheader("Detailed Feedback:")
-            for feedback in results["feedback"]:
-                st.write(feedback)
-            st.subheader("Recommendations:")
-            for rec in results["recommendations"]:
-                st.write(f"• {rec}")
-def render_learning_path():
-    """Render personalized learning path"""
-    st.header("🎯 Your Learning Path")
-    user_progress = st.session_state.user_progress
-    framework = st.session_state.educational_framework
-    # Display current progress
-    col1, col2, col3 = st.columns(3)
-    with col1:
-        st.metric("Completed Objectives", len(user_progress.completed_objectives))
-    with col2:
-        avg_score = (
-            np.mean(list(user_progress.competency_scores.values()))
-            if user_progress.competency_scores
-            else 0
-        )
-        st.metric("Average Score", f"{avg_score:.1%}")
-    with col3:
-        st.metric("Current Level", user_progress.current_level.title())
-    # Learning style selection
-    st.subheader("Learning Preferences")
-    learning_styles = ["visual", "hands-on", "theoretical", "collaborative"]
-    current_style = user_progress.preferred_learning_style
-    new_style = st.selectbox(
-        "Preferred learning style:",
-        learning_styles,
-        index=(
-            learning_styles.index(current_style)
-            if current_style in learning_styles
-            else 0
-        ),
-    )
-    if new_style != current_style:
-        user_progress.preferred_learning_style = new_style
-        framework.save_user_progress()
-        st.success("Learning style updated!")
-    # Target competencies
-    st.subheader("Learning Goals")
-    target_competencies = st.multiselect(
-        "Select areas you want to focus on:",
-        ["spectroscopy", "polymer_science", "machine_learning", "data_analysis"],
-        default=["spectroscopy", "polymer_science"],
-    )
-    if st.button("Generate Learning Path"):
-        learning_path = framework.get_personalized_learning_path(target_competencies)
-        if learning_path:
-            st.subheader("Recommended Learning Path:")
-            for i, item in enumerate(learning_path):
-                objective = item["objective"]
-                with st.expander(
-                    f"{i+1}. {objective['title']} (Level {objective['difficulty_level']})"
-                ):
-                    st.write(f"**Description:** {objective['description']}")
-                    st.write(
-                        f"**Estimated time:** {objective['estimated_time']} minutes"
-                    )
-                    st.write(
-                        f"**Recommended approach:** {item['recommended_approach']}"
-                    )
-                    if item["priority_resources"]:
-                        st.write("**Priority resources:**")
-                        for resource in item["priority_resources"]:
-                            st.write(f"- {resource['type']}: {resource['url']}")
-        else:
-            st.info("Complete an assessment to get personalized recommendations!")
-def render_virtual_laboratory():
-    """Render virtual laboratory interface"""
-    st.header("🔬 Virtual Laboratory")
-    framework = st.session_state.educational_framework
-    virtual_lab = framework.virtual_lab
-    # Select experiment
-    experiments = list(virtual_lab.experiments.keys())
-    selected_experiment = st.selectbox(
-        "Select experiment:",
-        experiments,
-        format_func=lambda x: virtual_lab.experiments[x]["title"],
-    )
-    experiment_info = virtual_lab.experiments[selected_experiment]
-    st.subheader(experiment_info["title"])
-    st.write(f"**Description:** {experiment_info['description']}")
-    st.write(f"**Difficulty:** {experiment_info['difficulty']}/5")
-    st.write(f"**Estimated time:** {experiment_info['estimated_time']} minutes")
-    # Experiment-specific inputs
-    if selected_experiment == "polymer_identification":
-        st.subheader("Polymer Identification Challenge")
-        polymer_type = st.selectbox(
-            "Select polymer to analyze:", ["PE", "PP", "PS", "PVC"]
-        )
-        if st.button("Generate Spectrum"):
-            result = framework.run_virtual_experiment(
-                selected_experiment, {"polymer_type": polymer_type}
-            )
-            if result.get("success"):
-                # Plot the spectrum
-                fig, ax = plt.subplots(figsize=(10, 6))
-                ax.plot(result["wavenumbers"], result["spectrum"])
-                ax.set_xlabel("Wavenumber (cm⁻¹)")
-                ax.set_ylabel("Intensity")
-                ax.set_title(f"Unknown Polymer Spectrum")
-                ax.grid(True, alpha=0.3)
-                st.pyplot(fig)
-                st.subheader("Analysis Hints:")
-                for hint in result["hints"]:
-                    st.write(f"💡 {hint}")
-                # User identification
-                user_guess = st.selectbox(
-                    "Your identification:", ["PE", "PP", "PS", "PVC"]
-                )
-                if st.button("Submit Identification"):
-                    if user_guess == polymer_type:
-                        st.success("🎉 Correct! Well done!")
-                    else:
-                        st.error(f"❌ Incorrect. The correct answer is {polymer_type}")
-    elif selected_experiment == "aging_simulation":
-        st.subheader("Polymer Aging Simulation")
-        aging_time = st.slider("Aging time (hours):", 0, 200, 50)
-        if st.button("Run Aging Simulation"):
-            result = framework.run_virtual_experiment(
-                selected_experiment, {"aging_time": aging_time}
-            )
-            if result.get("success"):
-                # Plot comparison
-                fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
-                # Initial spectrum
-                ax1.plot(result["wavenumbers"], result["initial_spectrum"])
-                ax1.set_title("Initial Spectrum")
-                ax1.set_xlabel("Wavenumber (cm⁻¹)")
-                ax1.set_ylabel("Intensity")
-                ax1.grid(True, alpha=0.3)
-                # Aged spectrum
-                ax2.plot(result["wavenumbers"], result["aged_spectrum"])
-                ax2.set_title(f"After {aging_time} hours")
-                ax2.set_xlabel("Wavenumber (cm⁻¹)")
-                ax2.set_ylabel("Intensity")
-                ax2.grid(True, alpha=0.3)
-                plt.tight_layout()
-                st.pyplot(fig)
-                st.subheader("Observations:")
-                for obs in result["observations"]:
-                    st.write(f"📊 {obs}")
-    elif selected_experiment == "model_training":
-        st.subheader("Train Your Own Model")
-        col1, col2 = st.columns(2)
-        with col1:
-            model_type = st.selectbox("Model type:", ["CNN", "ResNet", "Transformer"])
-        with col2:
-            epochs = st.slider("Training epochs:", 5, 50, 10)
-        if st.button("Start Training"):
-            with st.spinner("Training model..."):
-                result = framework.run_virtual_experiment(
-                    selected_experiment, {"model_type": model_type, "epochs": epochs}
-                )
-            if result.get("success"):
-                # Plot training metrics
-                fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
-                # Training loss
-                ax1.plot(result["train_losses"])
-                ax1.set_title("Training Loss")
-                ax1.set_xlabel("Epoch")
-                ax1.set_ylabel("Loss")
-                ax1.grid(True, alpha=0.3)
-                # Validation accuracy
-                ax2.plot(result["val_accuracies"])
-                ax2.set_title("Validation Accuracy")
-                ax2.set_xlabel("Epoch")
-                ax2.set_ylabel("Accuracy")
-                ax2.grid(True, alpha=0.3)
-                plt.tight_layout()
-                st.pyplot(fig)
-                st.success(
-                    f"Training completed! Final accuracy: {result['final_accuracy']:.3f}"
-                )
-                st.subheader("Training Insights:")
-                for insight in result["insights"]:
-                    st.write(f"🎯 {insight}")
-def render_progress_analytics():
-    """Render learning analytics dashboard"""
-    st.header("📊 Your Progress Analytics")
-    framework = st.session_state.educational_framework
-    analytics = framework.get_learning_analytics()
-    if analytics:
-        # Overview metrics
-        col1, col2, col3, col4 = st.columns(4)
-        with col1:
-            st.metric("Completed Objectives", analytics["completed_objectives"])
-        with col2:
-            st.metric("Study Time", f"{analytics['total_study_time']} min")
-        with col3:
-            st.metric("Current Level", analytics["current_level"].title())
-        with col4:
-            st.metric("Sessions", analytics["session_count"])
-        # Competency scores
-        if analytics["competency_scores"]:
-            st.subheader("Competency Scores")
-            domains = list(analytics["competency_scores"].keys())
-            scores = list(analytics["competency_scores"].values())
-            fig, ax = plt.subplots(figsize=(10, 6))
-            bars = ax.bar(domains, scores)
-            ax.set_ylabel("Score")
-            ax.set_title("Competency Assessment Results")
-            ax.set_ylim(0, 1)
-            # Color bars based on score
-            for bar, score in zip(bars, scores):
-                if score >= 0.8:
-                    bar.set_color("green")
-                elif score >= 0.6:
-                    bar.set_color("orange")
-                else:
-                    bar.set_color("red")
-            plt.xticks(rotation=45)
-            plt.tight_layout()
-            st.pyplot(fig)
-        # Learning style
-        st.subheader("Learning Profile")
-        st.write(f"**Preferred learning style:** {analytics['learning_style'].title()}")
-        # Recommendations
-        recommendations = framework.get_learning_recommendations()
-        if recommendations:
-            st.subheader("Next Steps")
-            for rec in recommendations:
-                st.write(f"• {rec}")
-    else:
-        st.info("Complete assessments to see your progress analytics!")
-def main():
-    """Main educational interface"""
-    st.set_page_config(
-        page_title="POLYMEROS Educational Interface", page_icon="🎓", layout="wide"
-    )
-    st.title("🎓 POLYMEROS Educational Interface")
-    st.markdown("**Interactive Learning System for Polymer Science and AI**")
-    # Initialize session
-    init_educational_session()
-    # Sidebar navigation
-    st.sidebar.title("📚 Learning Modules")
-    page = st.sidebar.selectbox(
-        "Select module:",
-        [
-            "Knowledge Assessment",
-            "Learning Path",
-            "Virtual Laboratory",
-            "Progress Analytics",
-        ],
-    )
-    # Render selected page
-    if page == "Knowledge Assessment":
-        render_competency_assessment()
-    elif page == "Learning Path":
-        render_learning_path()
-    elif page == "Virtual Laboratory":
-        render_virtual_laboratory()
-    elif page == "Progress Analytics":
-        render_progress_analytics()
-    # Footer
-    st.sidebar.markdown("---")
-    st.sidebar.markdown("**POLYMEROS Educational Framework**")
-    st.sidebar.markdown("*Adaptive learning for polymer science*")
-if __name__ == "__main__":
-    main()

requirements.txt CHANGED Viewed

@@ -16,4 +16,20 @@ matplotlib
 xgboost
 requests
 Pillow
-plotly

 xgboost
 requests
 Pillow
+plotly
+# New additions for enhanced features
+psutil
+joblib
+pytest
+tqdm
+pyarrow
+tenacity
+GitPython
+docker
+async-lru
+anyio
+websocket-client
+inquirerpy
+networkx
+mermaid_cli