Spaces:

Noveumai
/

NovaEval

Sleeping

App Files Files Community

shashank_test

by shashankagar - opened Jul 15

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+478

-1642

This PR is in draft mode

Files changed (7) hide show

Dockerfile +0 -31
README.md +433 -158
app.py +0 -1447
fixed-novaeval-space.zip +3 -0
novaeval-space-deployment.zip +3 -0
package.json +39 -0
requirements.txt +0 -6

Dockerfile DELETED Viewed

@@ -1,31 +0,0 @@
-FROM python:3.11-slim
-# Set working directory
-WORKDIR /app
-# Install system dependencies
-RUN apt-get update && apt-get install -y \
-    curl \
-    && rm -rf /var/lib/apt/lists/*
-# Copy requirements and install Python dependencies
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-# Copy application code
-COPY app.py app.py
-# Create non-root user for security
-RUN useradd -m -u 1000 user
-USER user
-# Expose port
-EXPOSE 7860
-# Health check
-HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-    CMD curl -f http://localhost:7860/api/health || exit 1
-# Run the application
-CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,191 +1,466 @@
 ---
-title: NovaEval by Noveum.ai
-emoji: ⚡
-colorFrom: purple
-colorTo: blue
-sdk: docker
 pinned: false
 ---
 # NovaEval by Noveum.ai
-Advanced AI Model Evaluation Platform powered by Hugging Face Models
 ## 🚀 Features
-### 🤖 **Comprehensive Model Selection**
-- **15+ Top Hugging Face Models** across different size categories
-- **Real-time Model Search** with provider filtering
-- **Detailed Model Information** including capabilities, size, and provider
-- **Size-based Filtering** (Small 1-3B, Medium 7B, Large 14B+)
-### 📊 **Rich Dataset Collection**
-- **11 Evaluation Datasets** covering reasoning, knowledge, math, code, and language
-- **Category-based Filtering** for easy dataset discovery
-- **Detailed Dataset Information** including sample counts and difficulty levels
-- **Popular Benchmarks** like MMLU, HellaSwag, GSM8K, HumanEval
-### ⚡ **Advanced Evaluation Engine**
-- **Real-time Progress Tracking** with WebSocket updates
-- **Live Evaluation Logs** showing detailed request/response data
-- **Multiple Metrics Support** (Accuracy, F1-Score, BLEU, ROUGE, Pass@K)
-- **Configurable Parameters** (sample size, temperature, max tokens)
-### 🎨 **Modern User Interface**
-- **Responsive Design** optimized for desktop and mobile
-- **Interactive Model Cards** with hover effects and selection states
-- **Real-time Configuration** with sliders and checkboxes
-- **Professional Gradient Design** with smooth animations
-## 🔧 **Technical Stack**
-- **Backend**: FastAPI + Python 3.11
-- **Frontend**: HTML5 + Tailwind CSS + Vanilla JavaScript
-- **Real-time**: WebSocket for live updates
-- **Models**: Hugging Face Inference API (free tier)
-- **Deployment**: Docker + Hugging Face Spaces
-## 📋 **Available Models**
-### Small Models (1-3B)
-- **FLAN-T5 Large** (0.8B) - Google
-- **Qwen 2.5 3B** (3B) - Alibaba
-- **Gemma 2B** (2B) - Google
-### Medium Models (7B)
-- **Qwen 2.5 7B** (7B) - Alibaba
-- **Mistral 7B** (7B) - Mistral AI
-- **DialoGPT Medium** (345M) - Microsoft
-- **CodeLlama 7B Python** (7B) - Meta
-### Large Models (14B+)
-- **Qwen 2.5 14B** (14B) - Alibaba
-- **Qwen 2.5 32B** (32B) - Alibaba
-- **Qwen 2.5 72B** (72B) - Alibaba
-## 📊 **Available Datasets**
-### Reasoning
-- **HellaSwag** - Commonsense reasoning (60K samples)
-- **CommonsenseQA** - Reasoning questions (12.1K samples)
-- **ARC** - Science reasoning (7.8K samples)
-### Knowledge
-- **MMLU** - Multitask understanding (231K samples)
-- **BoolQ** - Reading comprehension (12.7K samples)
-### Math
-- **GSM8K** - Grade school math (17.6K samples)
-- **AQUA-RAT** - Algebraic reasoning (196K samples)
-### Code
-- **HumanEval** - Python code generation (164 samples)
-- **MBPP** - Basic Python problems (1.4K samples)
-### Language
-- **IMDB Reviews** - Sentiment analysis (100K samples)
-- **CNN/DailyMail** - Summarization (936K samples)
-## 🎯 **Evaluation Metrics**
-- **Accuracy** - Percentage of correct predictions
-- **F1 Score** - Harmonic mean of precision and recall
-- **BLEU Score** - Text generation quality
-- **ROUGE Score** - Summarization quality
-- **Pass@K** - Code generation success rate
-## 🚀 **Quick Start**
-### Option 1: Direct Upload to Hugging Face Spaces
-1. Create a new Space on Hugging Face
-2. Choose "Docker" as the SDK
-3. Upload these files:
-   - `app.py` (renamed from `advanced_novaeval_app.py`)
-   - `requirements.txt`
-   - `Dockerfile`
-   - `README.md`
-4. Commit and push - your Space will build automatically!
-### Option 2: Local Development
 ```bash
 # Install dependencies
-pip install -r requirements.txt
-# Run the application
-python advanced_novaeval_app.py
-# Open browser to http://localhost:7860
 ```
-## 🔧 **Configuration Options**
-### Model Parameters
-- **Sample Size**: 10-1000 samples
-- **Temperature**: 0.0-2.0 (creativity control)
-- **Max Tokens**: 128-2048 (response length)
-- **Top-p**: 0.9 (nucleus sampling)
-### Evaluation Settings
-- **Multiple Model Selection**: Compare up to 10 models
-- **Flexible Metrics**: Choose relevant metrics for your task
-- **Real-time Monitoring**: Watch evaluations progress live
-- **Export Results**: Download results in JSON format
-## 📱 **User Experience**
-### Workflow
-1. **Select Models** - Choose from 15+ Hugging Face models
-2. **Pick Dataset** - Select from 11 evaluation datasets
-3. **Configure Metrics** - Choose relevant evaluation metrics
-4. **Set Parameters** - Adjust sample size, temperature, etc.
-5. **Start Evaluation** - Watch real-time progress and logs
-6. **View Results** - Analyze performance comparisons
-### Features
-- **Model Search** - Find models by name or provider
-- **Category Filtering** - Filter by model size or dataset type
-- **Real-time Logs** - See actual evaluation steps
-- **Progress Tracking** - Visual progress bars and percentages
-- **Interactive Results** - Compare models side-by-side
-## 🌟 **Why NovaEval?**
-### For Researchers
-- **Comprehensive Benchmarking** across multiple models and datasets
-- **Standardized Evaluation** with consistent metrics and procedures
-- **Real-time Monitoring** to track evaluation progress
-- **Export Capabilities** for further analysis
-### For Developers
-- **Easy Integration** with Hugging Face ecosystem
-- **No API Keys Required** - uses free HF Inference API
-- **Modern Interface** with responsive design
-- **Detailed Logging** for debugging and analysis
-### For Teams
-- **Collaborative Evaluation** with shareable results
-- **Professional Interface** suitable for presentations
-- **Comprehensive Documentation** for easy onboarding
-- **Open Source** with full customization capabilities
-## 🔗 **Links**
-- **Noveum.ai**: [https://noveum.ai](https://noveum.ai)
-- **NovaEval Framework**: [https://github.com/Noveum/NovaEval](https://github.com/Noveum/NovaEval)
-- **Hugging Face Models**: [https://huggingface.co/models](https://huggingface.co/models)
-- **Documentation**: Available in the application interface
-## 📄 **License**
-This project is open source and available under the MIT License.
-## 🤝 **Contributing**
-We welcome contributions! Please see our contributing guidelines for more information.
----
-**Built with ❤️ by [Noveum.ai](https://noveum.ai) - Advancing AI Evaluation**

 ---
+title: NovaEval
+emoji: 🐠
+colorFrom: indigo
+colorTo: red
+sdk: static
 pinned: false
+app_build_command: npm run build
+app_file: build/index.html
+license: apache-2.0
+short_description: A comprehensive AI model evaluation framework.
 ---
 # NovaEval by Noveum.ai
+[![CI](https://github.com/Noveum/NovaEval/actions/workflows/ci.yml/badge.svg)](https://github.com/Noveum/NovaEval/actions/workflows/ci.yml)
+[![Release](https://github.com/Noveum/NovaEval/actions/workflows/release.yml/badge.svg)](https://github.com/Noveum/NovaEval/actions/workflows/release.yml)
+[![codecov](https://codecov.io/gh/Noveum/NovaEval/branch/main/graph/badge.svg)](https://codecov.io/gh/Noveum/NovaEval)
+[![PyPI version](https://badge.fury.io/py/novaeval.svg)](https://badge.fury.io/py/novaeval)
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+A comprehensive, extensible AI model evaluation framework designed for production use. NovaEval provides a unified interface for evaluating language models across various datasets, metrics, and deployment scenarios.
+## 🚧 Development Status
+> **⚠️ ACTIVE DEVELOPMENT - NOT PRODUCTION READY**
+>
+> NovaEval is currently in active development and **not recommended for production use**. We are actively working on improving stability, adding features, and expanding test coverage. APIs may change without notice.
+>
+> **We're looking for contributors!** See the [Contributing](#-contributing) section below for ways to help.
+## 🤝 We Need Your Help!
+NovaEval is an open-source project that thrives on community contributions. Whether you're a seasoned developer or just getting started, there are many ways to contribute:
+### 🎯 High-Priority Contribution Areas
+We're actively looking for contributors in these key areas:
+- **🧪 Unit Tests**: Help us improve our test coverage (currently 23% overall, 90%+ for core modules)
+- **📚 Examples**: Create real-world evaluation examples and use cases
+- **📝 Guides & Notebooks**: Write evaluation guides and interactive Jupyter notebooks
+- **📖 Documentation**: Improve API documentation and user guides
+- **🔍 RAG Metrics**: Add more metrics specifically for Retrieval-Augmented Generation evaluation
+- **🤖 Agent Evaluation**: Build frameworks for evaluating AI agents and multi-turn conversations
+### 🚀 Getting Started as a Contributor
+1. **Start Small**: Pick up issues labeled `good first issue` or `help wanted`
+2. **Join Discussions**: Share your ideas in [GitHub Discussions](https://github.com/Noveum/NovaEval/discussions)
+3. **Review Code**: Help review pull requests and provide feedback
+4. **Report Issues**: Found a bug? Report it in [GitHub Issues](https://github.com/Noveum/NovaEval/issues)
+5. **Spread the Word**: Star the repository and share with your network
 ## 🚀 Features
+- **Multi-Model Support**: Evaluate models from OpenAI, Anthropic, AWS Bedrock, and custom providers
+- **Extensible Scoring**: Built-in scorers for accuracy, semantic similarity, code evaluation, and custom metrics
+- **Dataset Integration**: Support for MMLU, HuggingFace datasets, custom datasets, and more
+- **Production Ready**: Docker support, Kubernetes deployment, and cloud integrations
+- **Comprehensive Reporting**: Detailed evaluation reports, artifacts, and visualizations
+- **Secure**: Built-in credential management and secret store integration
+- **Scalable**: Designed for both local testing and large-scale production evaluations
+- **Cross-Platform**: Tested on macOS, Linux, and Windows with comprehensive CI/CD
+## 📦 Installation
+### From PyPI (Recommended)
+```bash
+pip install novaeval
+```
+### From Source
+```bash
+git clone https://github.com/Noveum/NovaEval.git
+cd NovaEval
+pip install -e .
+```
+### Docker
+```bash
+docker pull noveum/novaeval:latest
+```
+## 🏃‍♂️ Quick Start
+### Basic Evaluation
+```python
+from novaeval import Evaluator
+from novaeval.datasets import MMLUDataset
+from novaeval.models import OpenAIModel
+from novaeval.scorers import AccuracyScorer
+# Configure for cost-conscious evaluation
+MAX_TOKENS = 100  # Adjust based on budget: 5-10 for answers, 100+ for reasoning
+# Initialize components
+dataset = MMLUDataset(
+    subset="elementary_mathematics",  # Easier subset for demo
+    num_samples=10,
+    split="test"
+)
+model = OpenAIModel(
+    model_name="gpt-4o-mini",  # Cost-effective model
+    temperature=0.0,
+    max_tokens=MAX_TOKENS
+)
+scorer = AccuracyScorer(extract_answer=True)
+# Create and run evaluation
+evaluator = Evaluator(
+    dataset=dataset,
+    models=[model],
+    scorers=[scorer],
+    output_dir="./results"
+)
+results = evaluator.run()
+# Display detailed results
+for model_name, model_results in results["model_results"].items():
+    for scorer_name, score_info in model_results["scores"].items():
+        if isinstance(score_info, dict):
+            mean_score = score_info.get("mean", 0)
+            count = score_info.get("count", 0)
+            print(f"{scorer_name}: {mean_score:.4f} ({count} samples)")
+```
+### Configuration-Based Evaluation
+```python
+from novaeval import Evaluator
+# Load configuration from YAML/JSON
+evaluator = Evaluator.from_config("evaluation_config.yaml")
+results = evaluator.run()
+```
+### Command Line Interface
+NovaEval provides a comprehensive CLI for running evaluations:
+```bash
+# Run evaluation from configuration file
+novaeval run config.yaml
+# Quick evaluation with minimal setup
+novaeval quick -d mmlu -m gpt-4 -s accuracy
+# List available datasets, models, and scorers
+novaeval list-datasets
+novaeval list-models
+novaeval list-scorers
+# Generate sample configuration
+novaeval generate-config sample-config.yaml
+```
+📖 **[Complete CLI Reference](docs/cli-reference.md)** - Detailed documentation for all CLI commands and options
+### Example Configuration
+```yaml
+# evaluation_config.yaml
+dataset:
+  type: "mmlu"
+  subset: "abstract_algebra"
+  num_samples: 500
+models:
+  - type: "openai"
+    model_name: "gpt-4"
+    temperature: 0.0
+  - type: "anthropic"
+    model_name: "claude-3-opus"
+    temperature: 0.0
+scorers:
+  - type: "accuracy"
+  - type: "semantic_similarity"
+    threshold: 0.8
+output:
+  directory: "./results"
+  formats: ["json", "csv", "html"]
+  upload_to_s3: true
+  s3_bucket: "my-eval-results"
+```
+## 🏗️ Architecture
+NovaEval is built with extensibility and modularity in mind:
+```
+src/novaeval/
+├── datasets/          # Dataset loaders and processors
+├── evaluators/        # Core evaluation logic
+├── integrations/      # External service integrations
+├── models/           # Model interfaces and adapters
+├── reporting/        # Report generation and visualization
+├── scorers/          # Scoring mechanisms and metrics
+└── utils/            # Utility functions and helpers
+```
+### Core Components
+- **Datasets**: Standardized interface for loading evaluation datasets
+- **Models**: Unified API for different AI model providers
+- **Scorers**: Pluggable scoring mechanisms for various evaluation metrics
+- **Evaluators**: Orchestrates the evaluation process
+- **Reporting**: Generates comprehensive reports and artifacts
+- **Integrations**: Handles external services (S3, credential stores, etc.)
+## 📊 Supported Datasets
+- **MMLU**: Massive Multitask Language Understanding
+- **HuggingFace**: Any dataset from the HuggingFace Hub
+- **Custom**: JSON, CSV, or programmatic dataset definitions
+- **Code Evaluation**: Programming benchmarks and code generation tasks
+- **Agent Traces**: Multi-turn conversation and agent evaluation
+## 🤖 Supported Models
+- **OpenAI**: GPT-3.5, GPT-4, and newer models
+- **Anthropic**: Claude family models
+- **AWS Bedrock**: Amazon's managed AI services
+- **Noveum AI Gateway**: Integration with Noveum's model gateway
+- **Custom**: Extensible interface for any API-based model
+## 📏 Built-in Scorers
+### Accuracy-Based
+- **ExactMatch**: Exact string matching
+- **Accuracy**: Classification accuracy
+- **F1Score**: F1 score for classification tasks
+### Semantic-Based
+- **SemanticSimilarity**: Embedding-based similarity scoring
+- **BERTScore**: BERT-based semantic evaluation
+- **RougeScore**: ROUGE metrics for text generation
+### Code-Specific
+- **CodeExecution**: Execute and validate code outputs
+- **SyntaxChecker**: Validate code syntax
+- **TestCoverage**: Code coverage analysis
+### Custom
+- **LLMJudge**: Use another LLM as a judge
+- **HumanEval**: Integration with human evaluation workflows
+## 🚀 Deployment
+### Local Development
 ```bash
 # Install dependencies
+pip install -e ".[dev]"
+# Run tests
+pytest
+# Run example evaluation
+python examples/basic_evaluation.py
 ```
+### Docker
+```bash
+# Build image
+docker build -t nova-eval .
+# Run evaluation
+docker run -v $(pwd)/config:/config -v $(pwd)/results:/results nova-eval --config /config/eval.yaml
+```
+### Kubernetes
+```bash
+# Deploy to Kubernetes
+kubectl apply -f kubernetes/
+# Check status
+kubectl get pods -l app=nova-eval
+```
+## 🔧 Configuration
+NovaEval supports configuration through:
+- **YAML/JSON files**: Declarative configuration
+- **Environment variables**: Runtime configuration
+- **Python code**: Programmatic configuration
+- **CLI arguments**: Command-line overrides
+### Environment Variables
+```bash
+export NOVA_EVAL_OUTPUT_DIR="./results"
+export NOVA_EVAL_LOG_LEVEL="INFO"
+export OPENAI_API_KEY="your-api-key"
+export AWS_ACCESS_KEY_ID="your-aws-key"
+```
+### CI/CD Integration
+NovaEval includes optimized GitHub Actions workflows:
+- **Unit tests** run on all PRs and pushes for quick feedback
+- **Integration tests** run on main branch only to minimize API costs
+- **Cross-platform testing** on macOS, Linux, and Windows
+## 📈 Reporting and Artifacts
+NovaEval generates comprehensive evaluation reports:
+- **Summary Reports**: High-level metrics and insights
+- **Detailed Results**: Per-sample predictions and scores
+- **Visualizations**: Charts and graphs for result analysis
+- **Artifacts**: Model outputs, intermediate results, and debug information
+- **Export Formats**: JSON, CSV, HTML, PDF
+### Example Report Structure
+```
+results/
+├── summary.json              # High-level metrics
+├── detailed_results.csv      # Per-sample results
+├── artifacts/
+│   ├── model_outputs/        # Raw model responses
+│   ├── intermediate/         # Processing artifacts
+│   └── debug/               # Debug information
+├── visualizations/
+│   ├── accuracy_by_category.png
+│   ├── score_distribution.png
+│   └── confusion_matrix.png
+└── report.html              # Interactive HTML report
+```
+## 🔌 Extending NovaEval
+### Custom Datasets
+```python
+from novaeval.datasets import BaseDataset
+class MyCustomDataset(BaseDataset):
+    def load_data(self):
+        # Implement data loading logic
+        return samples
+    def get_sample(self, index):
+        # Return individual sample
+        return sample
+```
+### Custom Scorers
+```python
+from novaeval.scorers import BaseScorer
+class MyCustomScorer(BaseScorer):
+    def score(self, prediction, ground_truth, context=None):
+        # Implement scoring logic
+        return score
+```
+### Custom Models
+```python
+from novaeval.models import BaseModel
+class MyCustomModel(BaseModel):
+    def generate(self, prompt, **kwargs):
+        # Implement model inference
+        return response
+```
+## 🤝 Contributing
+We welcome contributions! NovaEval is actively seeking contributors to help build a robust AI evaluation framework. Please see our [Contributing Guide](CONTRIBUTING.md) for detailed guidelines.
+### 🎯 Priority Contribution Areas
+As mentioned in the [We Need Your Help](#-we-need-your-help) section, we're particularly looking for help with:
+1. **Unit Tests** - Expand test coverage beyond the current 23%
+2. **Examples** - Real-world evaluation scenarios and use cases
+3. **Guides & Notebooks** - Interactive evaluation tutorials
+4. **Documentation** - API docs, user guides, and tutorials
+5. **RAG Metrics** - Specialized metrics for retrieval-augmented generation
+6. **Agent Evaluation** - Frameworks for multi-turn and agent-based evaluations
+### Development Setup
+```bash
+# Clone repository
+git clone https://github.com/Noveum/NovaEval.git
+cd NovaEval
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+# Install development dependencies
+pip install -e ".[dev]"
+# Install pre-commit hooks
+pre-commit install
+# Run tests
+pytest
+# Run with coverage
+pytest --cov=src/novaeval --cov-report=html
+```
+### 🏗️ Contribution Workflow
+1. **Fork** the repository
+2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
+3. **Make** your changes following our coding standards
+4. **Add** tests for your changes
+5. **Commit** your changes (`git commit -m 'Add amazing feature'`)
+6. **Push** to the branch (`git push origin feature/amazing-feature`)
+7. **Open** a Pull Request
+### 📋 Contribution Guidelines
+- **Code Quality**: Follow PEP 8 and use the provided pre-commit hooks
+- **Testing**: Add unit tests for new features and bug fixes
+- **Documentation**: Update documentation for API changes
+- **Commit Messages**: Use conventional commit format
+- **Issues**: Reference relevant issues in your PR description
+### 🎉 Recognition
+Contributors will be:
+- Listed in our contributors page
+- Mentioned in release notes for significant contributions
+- Invited to join our contributor Discord community
+## 📄 License
+This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
+## 🙏 Acknowledgments
+- Inspired by evaluation frameworks like DeepEval, Confident AI, and Braintrust
+- Built with modern Python best practices and industry standards
+- Designed for the AI evaluation community
+## 📞 Support
+- **Documentation**: [https://noveum.github.io/NovaEval](https://noveum.github.io/NovaEval)
+- **Issues**: [GitHub Issues](https://github.com/Noveum/NovaEval/issues)
+- **Discussions**: [GitHub Discussions](https://github.com/Noveum/NovaEval/discussions)
+- **Email**: [email protected]
+---
+Made with ❤️ by the Noveum.ai team

app.py DELETED Viewed

@@ -1,1447 +0,0 @@
-"""
-NovaEval Space by Noveum.ai
-Advanced AI Model Evaluation Platform using NovaEval Framework
-"""
-import asyncio
-import json
-import logging
-import os
-import sys
-import time
-import uuid
-from datetime import datetime
-from typing import Dict, List, Optional, Any
-import uvicorn
-from fastapi import FastAPI, WebSocket, WebSocketDisconnect, HTTPException
-from fastapi.responses import HTMLResponse
-from fastapi.middleware.cors import CORSMiddleware
-from pydantic import BaseModel
-import httpx
-import traceback
-# Configure comprehensive logging
-logging.basicConfig(
-    level=logging.INFO,
-    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
-    handlers=[logging.StreamHandler(sys.stdout)]
-)
-logger = logging.getLogger(__name__)
-app = FastAPI(
-    title="NovaEval by Noveum.ai",
-    description="Advanced AI Model Evaluation Platform using NovaEval Framework",
-    version="4.0.0"
-)
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"],
-    allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
-)
-# Pydantic Models
-class EvaluationRequest(BaseModel):
-    models: List[str]
-    dataset: str
-    metrics: List[str]
-    sample_size: int = 50
-    temperature: float = 0.7
-    max_tokens: int = 512
-    top_p: float = 0.9
-class EvaluationResponse(BaseModel):
-    evaluation_id: str
-    status: str
-    message: str
-# Global state
-active_evaluations = {}
-websocket_connections = {}
-request_logs = []
-# Hugging Face Models Configuration
-HF_MODELS = {
-    "small": [
-        {
-            "id": "google/flan-t5-large",
-            "name": "FLAN-T5 Large",
-            "size": "0.8B",
-            "description": "Instruction-tuned T5 model for various NLP tasks",
-            "capabilities": ["text-generation", "reasoning", "qa"],
-            "provider": "Google"
-        },
-        {
-            "id": "Qwen/Qwen2.5-3B",
-            "name": "Qwen 2.5 3B",
-            "size": "3B",
-            "description": "Latest Qwen model with strong reasoning capabilities",
-            "capabilities": ["text-generation", "reasoning", "multilingual"],
-            "provider": "Alibaba"
-        },
-        {
-            "id": "google/gemma-2b",
-            "name": "Gemma 2B",
-            "size": "2B",
-            "description": "Efficient small model based on Gemini research",
-            "capabilities": ["text-generation", "reasoning"],
-            "provider": "Google"
-        }
-    ],
-    "medium": [
-        {
-            "id": "Qwen/Qwen2.5-7B",
-            "name": "Qwen 2.5 7B",
-            "size": "7B",
-            "description": "Balanced performance and efficiency for most tasks",
-            "capabilities": ["text-generation", "reasoning", "analysis"],
-            "provider": "Alibaba"
-        },
-        {
-            "id": "mistralai/Mistral-7B-v0.1",
-            "name": "Mistral 7B",
-            "size": "7B",
-            "description": "High-performance open model with Apache 2.0 license",
-            "capabilities": ["text-generation", "reasoning", "analysis"],
-            "provider": "Mistral AI"
-        },
-        {
-            "id": "microsoft/DialoGPT-medium",
-            "name": "DialoGPT Medium",
-            "size": "345M",
-            "description": "Specialized for conversational AI applications",
-            "capabilities": ["conversation", "dialogue"],
-            "provider": "Microsoft"
-        },
-        {
-            "id": "codellama/CodeLlama-7b-Python-hf",
-            "name": "CodeLlama 7B Python",
-            "size": "7B",
-            "description": "Specialized for Python code generation and understanding",
-            "capabilities": ["code-generation", "python"],
-            "provider": "Meta"
-        }
-    ],
-    "large": [
-        {
-            "id": "Qwen/Qwen2.5-14B",
-            "name": "Qwen 2.5 14B",
-            "size": "14B",
-            "description": "High-performance model for complex reasoning tasks",
-            "capabilities": ["text-generation", "reasoning", "analysis", "complex-tasks"],
-            "provider": "Alibaba"
-        },
-        {
-            "id": "Qwen/Qwen2.5-32B",
-            "name": "Qwen 2.5 32B",
-            "size": "32B",
-            "description": "Large-scale model for advanced AI applications",
-            "capabilities": ["text-generation", "reasoning", "analysis", "complex-tasks"],
-            "provider": "Alibaba"
-        },
-        {
-            "id": "Qwen/Qwen2.5-72B",
-            "name": "Qwen 2.5 72B",
-            "size": "72B",
-            "description": "State-of-the-art open model for research and production",
-            "capabilities": ["text-generation", "reasoning", "analysis", "complex-tasks"],
-            "provider": "Alibaba"
-        }
-    ]
-}
-# Evaluation Datasets Configuration
-EVALUATION_DATASETS = {
-    "reasoning": [
-        {
-            "id": "Rowan/hellaswag",
-            "name": "HellaSwag",
-            "description": "Commonsense reasoning benchmark testing story completion",
-            "samples": 60000,
-            "task_type": "multiple_choice",
-            "difficulty": "medium"
-        },
-        {
-            "id": "tau/commonsense_qa",
-            "name": "CommonsenseQA",
-            "description": "Multiple-choice questions requiring commonsense reasoning",
-            "samples": 12100,
-            "task_type": "multiple_choice",
-            "difficulty": "medium"
-        },
-        {
-            "id": "allenai/ai2_arc",
-            "name": "ARC (AI2 Reasoning Challenge)",
-            "description": "Science exam questions requiring reasoning skills",
-            "samples": 7790,
-            "task_type": "multiple_choice",
-            "difficulty": "hard"
-        }
-    ],
-    "knowledge": [
-        {
-            "id": "cais/mmlu",
-            "name": "MMLU",
-            "description": "Massive Multitask Language Understanding across 57 subjects",
-            "samples": 231000,
-            "task_type": "multiple_choice",
-            "difficulty": "hard"
-        },
-        {
-            "id": "google/boolq",
-            "name": "BoolQ",
-            "description": "Yes/No questions requiring reading comprehension",
-            "samples": 12700,
-            "task_type": "yes_no",
-            "difficulty": "medium"
-        }
-    ],
-    "math": [
-        {
-            "id": "openai/gsm8k",
-            "name": "GSM8K",
-            "description": "Grade school math word problems with step-by-step solutions",
-            "samples": 17600,
-            "task_type": "generation",
-            "difficulty": "medium"
-        },
-        {
-            "id": "deepmind/aqua_rat",
-            "name": "AQUA-RAT",
-            "description": "Algebraic word problems with rationales",
-            "samples": 196000,
-            "task_type": "multiple_choice",
-            "difficulty": "hard"
-        }
-    ],
-    "code": [
-        {
-            "id": "openai/openai_humaneval",
-            "name": "HumanEval",
-            "description": "Python programming problems for code generation evaluation",
-            "samples": 164,
-            "task_type": "code_generation",
-            "difficulty": "hard"
-        },
-        {
-            "id": "google-research-datasets/mbpp",
-            "name": "MBPP",
-            "description": "Mostly Basic Python Problems for code understanding",
-            "samples": 1400,
-            "task_type": "code_generation",
-            "difficulty": "medium"
-        }
-    ],
-    "language": [
-        {
-            "id": "stanfordnlp/imdb",
-            "name": "IMDB Reviews",
-            "description": "Movie review sentiment classification dataset",
-            "samples": 100000,
-            "task_type": "classification",
-            "difficulty": "easy"
-        },
-        {
-            "id": "abisee/cnn_dailymail",
-            "name": "CNN/DailyMail",
-            "description": "News article summarization dataset",
-            "samples": 936000,
-            "task_type": "summarization",
-            "difficulty": "medium"
-        }
-    ]
-}
-# Evaluation Metrics
-EVALUATION_METRICS = [
-    {
-        "id": "accuracy",
-        "name": "Accuracy",
-        "description": "Percentage of correct predictions",
-        "applicable_tasks": ["multiple_choice", "yes_no", "classification"]
-    },
-    {
-        "id": "f1_score",
-        "name": "F1 Score",
-        "description": "Harmonic mean of precision and recall",
-        "applicable_tasks": ["classification", "multiple_choice"]
-    },
-    {
-        "id": "bleu",
-        "name": "BLEU Score",
-        "description": "Quality metric for text generation tasks",
-        "applicable_tasks": ["generation", "summarization", "code_generation"]
-    },
-    {
-        "id": "rouge",
-        "name": "ROUGE Score",
-        "description": "Recall-oriented metric for summarization",
-        "applicable_tasks": ["summarization", "generation"]
-    },
-    {
-        "id": "pass_at_k",
-        "name": "Pass@K",
-        "description": "Percentage of problems solved correctly in code generation",
-        "applicable_tasks": ["code_generation"]
-    }
-]
-def log_request(request_type: str, data: dict, response: dict = None, error: str = None):
-    """Log all requests and responses for debugging"""
-    log_entry = {
-        "timestamp": datetime.now().isoformat(),
-        "request_type": request_type,
-        "request_data": data,
-        "response": response,
-        "error": error,
-        "id": str(uuid.uuid4())
-    }
-    request_logs.append(log_entry)
-    # Keep only last 1000 logs to prevent memory issues
-    if len(request_logs) > 1000:
-        request_logs.pop(0)
-    # Log to console
-    logger.info(f"REQUEST [{request_type}]: {json.dumps(log_entry, indent=2)}")
-async def send_websocket_message(evaluation_id: str, message: dict):
-    """Send message to WebSocket connection if exists"""
-    if evaluation_id in websocket_connections:
-        try:
-            await websocket_connections[evaluation_id].send_text(json.dumps(message))
-            log_request("websocket_send", {"evaluation_id": evaluation_id, "message": message})
-        except Exception as e:
-            logger.error(f"Failed to send WebSocket message: {e}")
-async def call_huggingface_api(model_id: str, prompt: str, max_tokens: int = 512, temperature: float = 0.7):
-    """Call Hugging Face Inference API"""
-    try:
-        headers = {
-            "Content-Type": "application/json"
-        }
-        payload = {
-            "inputs": prompt,
-            "parameters": {
-                "max_new_tokens": max_tokens,
-                "temperature": temperature,
-                "return_full_text": False
-            }
-        }
-        url = f"https://api-inference.huggingface.co/models/{model_id}"
-        log_request("hf_api_call", {
-            "model_id": model_id,
-            "url": url,
-            "payload": payload
-        })
-        async with httpx.AsyncClient(timeout=30.0) as client:
-            response = await client.post(url, headers=headers, json=payload)
-            response_data = response.json()
-            log_request("hf_api_response", {
-                "model_id": model_id,
-                "status_code": response.status_code,
-                "response": response_data
-            })
-            if response.status_code == 200:
-                return response_data
-            else:
-                raise Exception(f"API Error: {response_data}")
-    except Exception as e:
-        log_request("hf_api_error", {"model_id": model_id, "error": str(e)})
-        raise e
-async def run_novaeval_evaluation(evaluation_id: str, request: EvaluationRequest):
-    """Run actual NovaEval evaluation with detailed logging"""
-    try:
-        # Initialize evaluation
-        active_evaluations[evaluation_id] = {
-            "status": "running",
-            "progress": 0,
-            "current_step": "Initializing NovaEval",
-            "results": {},
-            "logs": [],
-            "start_time": datetime.now(),
-            "request": request.dict()
-        }
-        await send_websocket_message(evaluation_id, {
-            "type": "log",
-            "timestamp": datetime.now().isoformat(),
-            "level": "INFO",
-            "message": f"🚀 Starting NovaEval evaluation with {len(request.models)} models"
-        })
-        await send_websocket_message(evaluation_id, {
-            "type": "log",
-            "timestamp": datetime.now().isoformat(),
-            "level": "INFO",
-            "message": f"📊 Dataset: {request.dataset} | Sample size: {request.sample_size}"
-        })
-        await send_websocket_message(evaluation_id, {
-            "type": "log",
-            "timestamp": datetime.now().isoformat(),
-            "level": "INFO",
-            "message": f"📏 Metrics: {', '.join(request.metrics)} | Temperature: {request.temperature}"
-        })
-        total_steps = len(request.models) * 6  # 6 steps per model
-        current_step = 0
-        # Process each model with NovaEval
-        for model_id in request.models:
-            model_name = model_id.split('/')[-1]
-            # Step 1: Initialize NovaEval for model
-            current_step += 1
-            await send_websocket_message(evaluation_id, {
-                "type": "progress",
-                "progress": (current_step / total_steps) * 100,
-                "current_step": f"Initializing NovaEval for {model_name}"
-            })
-            await send_websocket_message(evaluation_id, {
-                "type": "log",
-                "timestamp": datetime.now().isoformat(),
-                "level": "INFO",
-                "message": f"🤖 Setting up NovaEval for model: {model_id}"
-            })
-            await asyncio.sleep(1)
-            # Step 2: Load dataset
-            current_step += 1
-            await send_websocket_message(evaluation_id, {
-                "type": "progress",
-                "progress": (current_step / total_steps) * 100,
-                "current_step": f"Loading dataset for {model_name}"
-            })
-            await send_websocket_message(evaluation_id, {
-                "type": "log",
-                "timestamp": datetime.now().isoformat(),
-                "level": "INFO",
-                "message": f"📥 Loading dataset: {request.dataset}"
-            })
-            await asyncio.sleep(1)
-            # Step 3: Prepare evaluation samples
-            current_step += 1
-            await send_websocket_message(evaluation_id, {
-                "type": "progress",
-                "progress": (current_step / total_steps) * 100,
-                "current_step": f"Preparing {request.sample_size} samples for {model_name}"
-            })
-            await send_websocket_message(evaluation_id, {
-                "type": "log",
-                "timestamp": datetime.now().isoformat(),
-                "level": "INFO",
-                "message": f"🔧 Preparing {request.sample_size} evaluation samples"
-            })
-            await asyncio.sleep(1)
-            # Step 4: Run NovaEval evaluation
-            current_step += 1
-            await send_websocket_message(evaluation_id, {
-                "type": "progress",
-                "progress": (current_step / total_steps) * 100,
-                "current_step": f"Running NovaEval on {model_name}"
-            })
-            await send_websocket_message(evaluation_id, {
-                "type": "log",
-                "timestamp": datetime.now().isoformat(),
-                "level": "INFO",
-                "message": f"🧪 Running NovaEval evaluation on {request.sample_size} samples"
-            })
-            # Simulate actual evaluation with sample requests
-            sample_requests = min(5, request.sample_size // 10)  # Show some sample requests
-            for i in range(sample_requests):
-                sample_prompt = f"Sample evaluation prompt {i+1} for {request.dataset}"
-                await send_websocket_message(evaluation_id, {
-                    "type": "log",
-                    "timestamp": datetime.now().isoformat(),
-                    "level": "DEBUG",
-                    "message": f"📝 REQUEST to {model_name}: {sample_prompt}"
-                })
-                try:
-                    # Make actual API call
-                    response = await call_huggingface_api(model_id, sample_prompt, request.max_tokens, request.temperature)
-                    response_text = response[0]['generated_text'] if response and len(response) > 0 else "No response"
-                    await send_websocket_message(evaluation_id, {
-                        "type": "log",
-                        "timestamp": datetime.now().isoformat(),
-                        "level": "DEBUG",
-                        "message": f"📤 RESPONSE from {model_name}: {response_text[:100]}..."
-                    })
-                except Exception as e:
-                    await send_websocket_message(evaluation_id, {
-                        "type": "log",
-                        "timestamp": datetime.now().isoformat(),
-                        "level": "WARNING",
-                        "message": f"⚠️ API Error for {model_name}: {str(e)}"
-                    })
-                await asyncio.sleep(0.5)
-            # Step 5: Calculate metrics with NovaEval
-            current_step += 1
-            await send_websocket_message(evaluation_id, {
-                "type": "progress",
-                "progress": (current_step / total_steps) * 100,
-                "current_step": f"Calculating metrics for {model_name}"
-            })
-            await send_websocket_message(evaluation_id, {
-                "type": "log",
-                "timestamp": datetime.now().isoformat(),
-                "level": "INFO",
-                "message": f"📊 NovaEval calculating metrics: {', '.join(request.metrics)}"
-            })
-            await asyncio.sleep(2)
-            # Step 6: Generate results
-            current_step += 1
-            await send_websocket_message(evaluation_id, {
-                "type": "progress",
-                "progress": (current_step / total_steps) * 100,
-                "current_step": f"Finalizing results for {model_name}"
-            })
-            # Generate realistic results based on model and dataset
-            results = {}
-            base_score = 0.65 + (hash(model_id + request.dataset) % 30) / 100
-            for metric in request.metrics:
-                if metric == "accuracy":
-                    results[metric] = round(base_score + (hash(model_id + metric) % 20) / 100, 3)
-                elif metric == "f1_score":
-                    results[metric] = round(base_score - 0.05 + (hash(model_id + metric) % 25) / 100, 3)
-                elif metric == "bleu":
-                    results[metric] = round(0.25 + (hash(model_id + metric) % 40) / 100, 3)
-                elif metric == "rouge":
-                    results[metric] = round(0.30 + (hash(model_id + metric) % 35) / 100, 3)
-                elif metric == "pass_at_k":
-                    results[metric] = round(0.15 + (hash(model_id + metric) % 50) / 100, 3)
-            active_evaluations[evaluation_id]["results"][model_id] = results
-            await send_websocket_message(evaluation_id, {
-                "type": "log",
-                "timestamp": datetime.now().isoformat(),
-                "level": "SUCCESS",
-                "message": f"✅ NovaEval completed for {model_name}: {results}"
-            })
-            await asyncio.sleep(1)
-        # Finalize evaluation
-        active_evaluations[evaluation_id]["status"] = "completed"
-        active_evaluations[evaluation_id]["progress"] = 100
-        active_evaluations[evaluation_id]["end_time"] = datetime.now()
-        await send_websocket_message(evaluation_id, {
-            "type": "complete",
-            "results": active_evaluations[evaluation_id]["results"],
-            "message": "🎉 NovaEval evaluation completed successfully!"
-        })
-        await send_websocket_message(evaluation_id, {
-            "type": "log",
-            "timestamp": datetime.now().isoformat(),
-            "level": "SUCCESS",
-            "message": "🎯 All NovaEval evaluations completed successfully!"
-        })
-        log_request("evaluation_complete", {
-            "evaluation_id": evaluation_id,
-            "results": active_evaluations[evaluation_id]["results"],
-            "duration": (active_evaluations[evaluation_id]["end_time"] - active_evaluations[evaluation_id]["start_time"]).total_seconds()
-        })
-    except Exception as e:
-        logger.error(f"NovaEval evaluation failed: {e}")
-        active_evaluations[evaluation_id]["status"] = "failed"
-        active_evaluations[evaluation_id]["error"] = str(e)
-        await send_websocket_message(evaluation_id, {
-            "type": "error",
-            "message": f"❌ NovaEval evaluation failed: {str(e)}"
-        })
-        log_request("evaluation_error", {
-            "evaluation_id": evaluation_id,
-            "error": str(e),
-            "traceback": traceback.format_exc()
-        })
-# API Endpoints
-@app.get("/", response_class=HTMLResponse)
-async def get_homepage():
-    """Serve the main application interface"""
-    return """
-<!DOCTYPE html>
-<html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>NovaEval by Noveum.ai - Advanced AI Model Evaluation</title>
-    <script src="https://cdn.tailwindcss.com"></script>
-    <script src="https://unpkg.com/lucide@latest/dist/umd/lucide.js"></script>
-    <style>
-        .gradient-bg {
-            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
-        }
-        .card-hover {
-            transition: all 0.3s ease;
-        }
-        .card-hover:hover {
-            transform: translateY(-2px);
-            box-shadow: 0 10px 25px rgba(0,0,0,0.1);
-        }
-        .tag-selected {
-            background: linear-gradient(45deg, #667eea, #764ba2);
-            color: white;
-        }
-        .tag-unselected {
-            background: #f3f4f6;
-            color: #374151;
-        }
-        .tag-unselected:hover {
-            background: #e5e7eb;
-        }
-        .progress-bar {
-            transition: width 0.5s ease;
-        }
-        .log-entry {
-            animation: slideIn 0.3s ease;
-        }
-        @keyframes slideIn {
-            from { opacity: 0; transform: translateX(-10px); }
-            to { opacity: 1; transform: translateX(0); }
-        }
-        .compact-card {
-            min-height: 120px;
-        }
-        .selection-panel {
-            max-height: 400px;
-            overflow-y: auto;
-        }
-    </style>
-</head>
-<body class="bg-gray-50 min-h-screen">
-    <!-- Header -->
-    <header class="gradient-bg text-white py-4 shadow-lg">
-        <div class="container mx-auto px-4">
-            <div class="flex items-center justify-between">
-                <div class="flex items-center space-x-3">
-                    <div class="w-8 h-8 bg-white rounded-lg flex items-center justify-center">
-                        <i data-lucide="zap" class="w-5 h-5 text-purple-600"></i>
-                    </div>
-                    <div>
-                        <h1 class="text-xl font-bold">NovaEval</h1>
-                        <p class="text-purple-100 text-xs">by <a href="https://noveum.ai" target="_blank" class="underline hover:text-white">Noveum.ai</a></p>
-                    </div>
-                </div>
-                <div class="text-right">
-                    <p class="text-purple-100 text-sm">Advanced AI Model Evaluation Platform</p>
-                    <p class="text-purple-200 text-xs">Powered by NovaEval Framework</p>
-                </div>
-            </div>
-        </div>
-    </header>
-    <!-- Info Banner -->
-    <div class="bg-blue-50 border-l-4 border-blue-400 p-4 mb-6">
-        <div class="container mx-auto">
-            <div class="flex items-start">
-                <div class="flex-shrink-0">
-                    <i data-lucide="info" class="w-5 h-5 text-blue-400"></i>
-                </div>
-                <div class="ml-3">
-                    <h3 class="text-sm font-medium text-blue-800">About NovaEval Platform</h3>
-                    <div class="mt-2 text-sm text-blue-700">
-                        <p>NovaEval is an advanced AI model evaluation framework that provides comprehensive benchmarking across multiple models and datasets. This platform allows you to:</p>
-                        <ul class="list-disc list-inside mt-2 space-y-1">
-                            <li><strong>Compare Multiple Models:</strong> Evaluate up to 10 Hugging Face models simultaneously</li>
-                            <li><strong>Comprehensive Datasets:</strong> Test on 11 evaluation datasets across reasoning, knowledge, math, code, and language tasks</li>
-                            <li><strong>Real-time Monitoring:</strong> Watch live evaluation progress with detailed request/response logging</li>
-                            <li><strong>Multiple Metrics:</strong> Assess performance using accuracy, F1-score, BLEU, ROUGE, and Pass@K metrics</li>
-                            <li><strong>NovaEval Framework:</strong> Powered by the open-source NovaEval evaluation framework for reliable, reproducible results</li>
-                        </ul>
-                    </div>
-                </div>
-            </div>
-        </div>
-    </div>
-    <div class="container mx-auto px-4 py-6">
-        <!-- Main Grid Layout -->
-        <div class="grid grid-cols-1 lg:grid-cols-4 gap-6">
-            <!-- Left Panel - Selection (3 columns) -->
-            <div class="lg:col-span-3 space-y-6">
-                <!-- Selection Row -->
-                <div class="grid grid-cols-1 md:grid-cols-3 gap-6">
-                    <!-- Models Selection -->
-                    <div class="bg-white rounded-xl shadow-lg p-4 card-hover">
-                        <div class="flex items-center space-x-2 mb-4">
-                            <i data-lucide="cpu" class="w-5 h-5 text-purple-600"></i>
-                            <h2 class="text-lg font-semibold text-gray-800">Models</h2>
-                            <span id="selectedModelsCount" class="text-sm text-gray-500">(0)</span>
-                        </div>
-                        <!-- Model Size Filters -->
-                        <div class="flex flex-wrap gap-1 mb-3">
-                            <button onclick="filterModels('all')" class="px-2 py-1 text-xs rounded-full tag-selected transition-all" id="filter-all">All</button>
-                            <button onclick="filterModels('small')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="filter-small">Small</button>
-                            <button onclick="filterModels('medium')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="filter-medium">Medium</button>
-                            <button onclick="filterModels('large')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="filter-large">Large</button>
-                        </div>
-                        <!-- Selected Models Tags -->
-                        <div id="selectedModelsTags" class="mb-3 min-h-[24px]">
-                            <!-- Selected model tags will appear here -->
-                        </div>
-                        <!-- Model Selection Panel -->
-                        <div id="modelGrid" class="selection-panel space-y-2">
-                            <!-- Models will be populated by JavaScript -->
-                        </div>
-                    </div>
-                    <!-- Dataset Selection -->
-                    <div class="bg-white rounded-xl shadow-lg p-4 card-hover">
-                        <div class="flex items-center space-x-2 mb-4">
-                            <i data-lucide="database" class="w-5 h-5 text-purple-600"></i>
-                            <h2 class="text-lg font-semibold text-gray-800">Dataset</h2>
-                        </div>
-                        <!-- Dataset Category Filters -->
-                        <div class="flex flex-wrap gap-1 mb-3">
-                            <button onclick="filterDatasets('all')" class="px-2 py-1 text-xs rounded-full tag-selected transition-all" id="dataset-filter-all">All</button>
-                            <button onclick="filterDatasets('reasoning')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="dataset-filter-reasoning">Reasoning</button>
-                            <button onclick="filterDatasets('knowledge')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="dataset-filter-knowledge">Knowledge</button>
-                            <button onclick="filterDatasets('math')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="dataset-filter-math">Math</button>
-                            <button onclick="filterDatasets('code')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="dataset-filter-code">Code</button>
-                            <button onclick="filterDatasets('language')" class="px-2 py-1 text-xs rounded-full tag-unselected transition-all" id="dataset-filter-language">Language</button>
-                        </div>
-                        <!-- Selected Dataset Tag -->
-                        <div id="selectedDatasetTag" class="mb-3 min-h-[24px]">
-                            <!-- Selected dataset tag will appear here -->
-                        </div>
-                        <!-- Dataset Selection Panel -->
-                        <div id="datasetGrid" class="selection-panel space-y-2">
-                            <!-- Datasets will be populated by JavaScript -->
-                        </div>
-                    </div>
-                    <!-- Metrics & Config -->
-                    <div class="bg-white rounded-xl shadow-lg p-4 card-hover">
-                        <div class="flex items-center space-x-2 mb-4">
-                            <i data-lucide="settings" class="w-5 h-5 text-purple-600"></i>
-                            <h2 class="text-lg font-semibold text-gray-800">Config</h2>
-                        </div>
-                        <!-- Selected Metrics Tags -->
-                        <div id="selectedMetricsTags" class="mb-3 min-h-[24px]">
-                            <!-- Selected metrics tags will appear here -->
-                        </div>
-                        <!-- Metrics Selection -->
-                        <div class="mb-4">
-                            <label class="block text-sm font-medium text-gray-700 mb-2">Metrics</label>
-                            <div id="metricsGrid" class="space-y-1">
-                                <!-- Metrics will be populated by JavaScript -->
-                            </div>
-                        </div>
-                        <!-- Parameters -->
-                        <div class="space-y-3">
-                            <div>
-                                <label class="block text-xs font-medium text-gray-700 mb-1">Sample Size</label>
-                                <input type="range" id="sampleSize" min="10" max="1000" value="50" step="10"
-                                       class="w-full h-2 bg-gray-200 rounded-lg appearance-none cursor-pointer">
-                                <div class="flex justify-between text-xs text-gray-500">
-                                    <span>10</span>
-                                    <span id="sampleSizeValue">50</span>
-                                    <span>1000</span>
-                                </div>
-                            </div>
-                            <div>
-                                <label class="block text-xs font-medium text-gray-700 mb-1">Temperature</label>
-                                <input type="range" id="temperature" min="0" max="2" step="0.1" value="0.7"
-                                       class="w-full h-2 bg-gray-200 rounded-lg appearance-none cursor-pointer">
-                                <div class="flex justify-between text-xs text-gray-500">
-                                    <span>0.0</span>
-                                    <span id="temperatureValue">0.7</span>
-                                    <span>2.0</span>
-                                </div>
-                            </div>
-                        </div>
-                        <!-- Start Button -->
-                        <button onclick="startEvaluation()" id="startBtn"
-                                class="w-full gradient-bg text-white py-2 px-4 rounded-lg font-semibold hover:opacity-90 transition-opacity disabled:opacity-50 disabled:cursor-not-allowed mt-4 text-sm">
-                            <i data-lucide="play" class="w-4 h-4 inline mr-1"></i>
-                            Start NovaEval
-                        </button>
-                    </div>
-                </div>
-                <!-- Results Panel -->
-                <div id="resultsPanel" class="bg-white rounded-xl shadow-lg p-6 card-hover hidden">
-                    <div class="flex items-center space-x-3 mb-4">
-                        <i data-lucide="bar-chart" class="w-6 h-6 text-purple-600"></i>
-                        <h2 class="text-xl font-semibold text-gray-800">NovaEval Results</h2>
-                    </div>
-                    <div id="resultsContent">
-                        <!-- Results will be populated by JavaScript -->
-                    </div>
-                </div>
-            </div>
-            <!-- Right Panel - Progress & Logs (1 column) -->
-            <div class="space-y-6">
-                <!-- Progress -->
-                <div class="bg-white rounded-xl shadow-lg p-4 card-hover">
-                    <div class="flex items-center space-x-2 mb-3">
-                        <i data-lucide="activity" class="w-5 h-5 text-purple-600"></i>
-                        <h2 class="text-lg font-semibold text-gray-800">Progress</h2>
-                    </div>
-                    <div id="progressSection" class="hidden">
-                        <div class="mb-3">
-                            <div class="flex justify-between text-xs text-gray-600 mb-1">
-                                <span id="currentStep">Initializing...</span>
-                                <span id="progressPercent">0%</span>
-                            </div>
-                            <div class="w-full bg-gray-200 rounded-full h-2">
-                                <div id="progressBar" class="bg-gradient-to-r from-purple-500 to-blue-500 h-2 rounded-full progress-bar" style="width: 0%"></div>
-                            </div>
-                        </div>
-                    </div>
-                    <div id="idleMessage" class="text-center text-gray-500 py-4">
-                        <i data-lucide="clock" class="w-8 h-8 mx-auto mb-2 text-gray-300"></i>
-                        <p class="text-sm">Ready to start NovaEval</p>
-                    </div>
-                </div>
-                <!-- Live Logs -->
-                <div class="bg-white rounded-xl shadow-lg p-4 card-hover">
-                    <div class="flex items-center space-x-2 mb-3">
-                        <i data-lucide="terminal" class="w-5 h-5 text-purple-600"></i>
-                        <h2 class="text-lg font-semibold text-gray-800">Live Logs</h2>
-                        <span class="text-xs text-gray-500">(Requests & Responses)</span>
-                    </div>
-                    <div id="logsContainer" class="bg-gray-900 text-green-400 p-3 rounded-lg h-64 overflow-y-auto font-mono text-xs">
-                        <div class="text-gray-500">Waiting for NovaEval to start...</div>
-                    </div>
-                </div>
-            </div>
-        </div>
-    </div>
-    <script>
-        // Global state
-        let selectedModels = [];
-        let selectedDataset = null;
-        let selectedMetrics = [];
-        let websocket = null;
-        let currentEvaluationId = null;
-        // Models data
-        const models = """ + json.dumps(HF_MODELS) + """;
-        const datasets = """ + json.dumps(EVALUATION_DATASETS) + """;
-        const metrics = """ + json.dumps(EVALUATION_METRICS) + """;
-        // Initialize the application
-        document.addEventListener('DOMContentLoaded', function() {
-            lucide.createIcons();
-            renderModels();
-            renderDatasets();
-            renderMetrics();
-            setupEventListeners();
-        });
-        function setupEventListeners() {
-            // Sample size slider - Fixed to work properly
-            const sampleSizeSlider = document.getElementById('sampleSize');
-            const sampleSizeValue = document.getElementById('sampleSizeValue');
-            sampleSizeSlider.addEventListener('input', function() {
-                sampleSizeValue.textContent = this.value;
-            });
-            // Temperature slider
-            const temperatureSlider = document.getElementById('temperature');
-            const temperatureValue = document.getElementById('temperatureValue');
-            temperatureSlider.addEventListener('input', function() {
-                temperatureValue.textContent = this.value;
-            });
-        }
-        function renderModels() {
-            const grid = document.getElementById('modelGrid');
-            grid.innerHTML = '';
-            Object.keys(models).forEach(category => {
-                models[category].forEach(model => {
-                    const modelCard = createModelCard(model, category);
-                    grid.appendChild(modelCard);
-                });
-            });
-        }
-        function createModelCard(model, category) {
-            const div = document.createElement('div');
-            div.className = `model-card p-2 border rounded-lg cursor-pointer hover:shadow-md transition-all compact-card`;
-            div.dataset.category = category;
-            div.dataset.modelId = model.id;
-            div.innerHTML = `
-                <div class="flex items-start justify-between mb-1">
-                    <div class="flex-1">
-                        <h3 class="font-semibold text-gray-800 text-sm">${model.name}</h3>
-                        <p class="text-xs text-gray-500">${model.provider}</p>
-                    </div>
-                    <div class="text-xs bg-gray-100 px-2 py-1 rounded">${model.size}</div>
-                </div>
-                <p class="text-xs text-gray-600 mb-2 line-clamp-2">${model.description}</p>
-                <div class="flex flex-wrap gap-1">
-                    ${model.capabilities.slice(0, 2).map(cap => `<span class="text-xs bg-purple-100 text-purple-700 px-1 py-0.5 rounded">${cap}</span>`).join('')}
-                </div>
-            `;
-            div.addEventListener('click', () => toggleModelSelection(model.id, model.name, div));
-            return div;
-        }
-        function toggleModelSelection(modelId, modelName, element) {
-            if (selectedModels.includes(modelId)) {
-                selectedModels = selectedModels.filter(id => id !== modelId);
-                element.classList.remove('ring-2', 'ring-purple-500', 'bg-purple-50');
-            } else {
-                selectedModels.push(modelId);
-                element.classList.add('ring-2', 'ring-purple-500', 'bg-purple-50');
-            }
-            updateSelectedModelsTags();
-            updateSelectedModelsCount();
-        }
-        function updateSelectedModelsTags() {
-            const container = document.getElementById('selectedModelsTags');
-            container.innerHTML = '';
-            selectedModels.forEach(modelId => {
-                const modelName = getModelName(modelId);
-                const tag = document.createElement('span');
-                tag.className = 'inline-flex items-center px-2 py-1 text-xs bg-purple-100 text-purple-800 rounded-full mr-1 mb-1';
-                tag.innerHTML = `
-                    ${modelName}
-                    <button onclick="removeModel('${modelId}')" class="ml-1 text-purple-600 hover:text-purple-800">
-                        <i data-lucide="x" class="w-3 h-3"></i>
-                    </button>
-                `;
-                container.appendChild(tag);
-            });
-            lucide.createIcons();
-        }
-        function removeModel(modelId) {
-            selectedModels = selectedModels.filter(id => id !== modelId);
-            // Update UI
-            const modelCard = document.querySelector(`[data-model-id="${modelId}"]`);
-            if (modelCard) {
-                modelCard.classList.remove('ring-2', 'ring-purple-500', 'bg-purple-50');
-            }
-            updateSelectedModelsTags();
-            updateSelectedModelsCount();
-        }
-        function getModelName(modelId) {
-            for (const category of Object.values(models)) {
-                for (const model of category) {
-                    if (model.id === modelId) {
-                        return model.name;
-                    }
-                }
-            }
-            return modelId.split('/').pop();
-        }
-        function updateSelectedModelsCount() {
-            document.getElementById('selectedModelsCount').textContent = `(${selectedModels.length})`;
-        }
-        function filterModels(category) {
-            // Update filter buttons
-            document.querySelectorAll('[id^="filter-"]').forEach(btn => {
-                btn.className = btn.className.replace('tag-selected', 'tag-unselected');
-            });
-            document.getElementById(`filter-${category}`).className =
-                document.getElementById(`filter-${category}`).className.replace('tag-unselected', 'tag-selected');
-            // Filter model cards
-            document.querySelectorAll('.model-card').forEach(card => {
-                if (category === 'all' || card.dataset.category === category) {
-                    card.style.display = 'block';
-                } else {
-                    card.style.display = 'none';
-                }
-            });
-        }
-        function renderDatasets() {
-            const grid = document.getElementById('datasetGrid');
-            grid.innerHTML = '';
-            Object.keys(datasets).forEach(category => {
-                datasets[category].forEach(dataset => {
-                    const datasetCard = createDatasetCard(dataset, category);
-                    grid.appendChild(datasetCard);
-                });
-            });
-        }
-        function createDatasetCard(dataset, category) {
-            const div = document.createElement('div');
-            div.className = `dataset-card p-2 border rounded-lg cursor-pointer hover:shadow-md transition-all compact-card`;
-            div.dataset.category = category;
-            div.dataset.datasetId = dataset.id;
-            div.innerHTML = `
-                <div class="flex items-start justify-between mb-1">
-                    <div class="flex-1">
-                        <h3 class="font-semibold text-gray-800 text-sm">${dataset.name}</h3>
-                        <p class="text-xs text-gray-600 line-clamp-2">${dataset.description}</p>
-                    </div>
-                    <div class="text-xs bg-gray-100 px-1 py-0.5 rounded">${dataset.samples.toLocaleString()}</div>
-                </div>
-                <div class="flex justify-between items-center mt-2">
-                    <span class="text-xs bg-blue-100 text-blue-700 px-1 py-0.5 rounded">${dataset.task_type}</span>
-                    <span class="text-xs text-gray-500">${dataset.difficulty}</span>
-                </div>
-            `;
-            div.addEventListener('click', () => selectDataset(dataset.id, dataset.name, div));
-            return div;
-        }
-        function selectDataset(datasetId, datasetName, element) {
-            // Remove previous selection
-            document.querySelectorAll('.dataset-card').forEach(card => {
-                card.classList.remove('ring-2', 'ring-purple-500', 'bg-purple-50');
-            });
-            // Add selection to clicked element
-            element.classList.add('ring-2', 'ring-purple-500', 'bg-purple-50');
-            selectedDataset = datasetId;
-            // Update selected dataset tag
-            updateSelectedDatasetTag(datasetName);
-        }
-        function updateSelectedDatasetTag(datasetName) {
-            const container = document.getElementById('selectedDatasetTag');
-            container.innerHTML = `
-                <span class="inline-flex items-center px-2 py-1 text-xs bg-blue-100 text-blue-800 rounded-full">
-                    ${datasetName}
-                    <button onclick="removeDataset()" class="ml-1 text-blue-600 hover:text-blue-800">
-                        <i data-lucide="x" class="w-3 h-3"></i>
-                    </button>
-                </span>
-            `;
-            lucide.createIcons();
-        }
-        function removeDataset() {
-            selectedDataset = null;
-            document.getElementById('selectedDatasetTag').innerHTML = '';
-            document.querySelectorAll('.dataset-card').forEach(card => {
-                card.classList.remove('ring-2', 'ring-purple-500', 'bg-purple-50');
-            });
-        }
-        function filterDatasets(category) {
-            // Update filter buttons
-            document.querySelectorAll('[id^="dataset-filter-"]').forEach(btn => {
-                btn.className = btn.className.replace('tag-selected', 'tag-unselected');
-            });
-            document.getElementById(`dataset-filter-${category}`).className =
-                document.getElementById(`dataset-filter-${category}`).className.replace('tag-unselected', 'tag-selected');
-            // Filter dataset cards
-            document.querySelectorAll('.dataset-card').forEach(card => {
-                if (category === 'all' || card.dataset.category === category) {
-                    card.style.display = 'block';
-                } else {
-                    card.style.display = 'none';
-                }
-            });
-        }
-        function renderMetrics() {
-            const grid = document.getElementById('metricsGrid');
-            grid.innerHTML = '';
-            metrics.forEach(metric => {
-                const div = document.createElement('div');
-                div.className = 'flex items-center space-x-2';
-                div.innerHTML = `
-                    <input type="checkbox" id="metric-${metric.id}" class="rounded text-purple-600 focus:ring-purple-500">
-                    <label for="metric-${metric.id}" class="text-xs text-gray-700 cursor-pointer">${metric.name}</label>
-                `;
-                const checkbox = div.querySelector('input');
-                checkbox.addEventListener('change', () => {
-                    if (checkbox.checked) {
-                        selectedMetrics.push(metric.id);
-                    } else {
-                        selectedMetrics = selectedMetrics.filter(id => id !== metric.id);
-                    }
-                    updateSelectedMetricsTags();
-                });
-                grid.appendChild(div);
-            });
-        }
-        function updateSelectedMetricsTags() {
-            const container = document.getElementById('selectedMetricsTags');
-            container.innerHTML = '';
-            selectedMetrics.forEach(metricId => {
-                const metricName = getMetricName(metricId);
-                const tag = document.createElement('span');
-                tag.className = 'inline-flex items-center px-2 py-1 text-xs bg-green-100 text-green-800 rounded-full mr-1 mb-1';
-                tag.innerHTML = `
-                    ${metricName}
-                    <button onclick="removeMetric('${metricId}')" class="ml-1 text-green-600 hover:text-green-800">
-                        <i data-lucide="x" class="w-3 h-3"></i>
-                    </button>
-                `;
-                container.appendChild(tag);
-            });
-            lucide.createIcons();
-        }
-        function removeMetric(metricId) {
-            selectedMetrics = selectedMetrics.filter(id => id !== metricId);
-            // Update checkbox
-            const checkbox = document.getElementById(`metric-${metricId}`);
-            if (checkbox) {
-                checkbox.checked = false;
-            }
-            updateSelectedMetricsTags();
-        }
-        function getMetricName(metricId) {
-            const metric = metrics.find(m => m.id === metricId);
-            return metric ? metric.name : metricId;
-        }
-        function startEvaluation() {
-            // Validation
-            if (selectedModels.length === 0) {
-                alert('Please select at least one model');
-                return;
-            }
-            if (!selectedDataset) {
-                alert('Please select a dataset');
-                return;
-            }
-            if (selectedMetrics.length === 0) {
-                alert('Please select at least one metric');
-                return;
-            }
-            // Prepare request
-            const request = {
-                models: selectedModels,
-                dataset: selectedDataset,
-                metrics: selectedMetrics,
-                sample_size: parseInt(document.getElementById('sampleSize').value),
-                temperature: parseFloat(document.getElementById('temperature').value),
-                max_tokens: 512,
-                top_p: 0.9
-            };
-            // Start evaluation
-            fetch('/api/evaluate', {
-                method: 'POST',
-                headers: {
-                    'Content-Type': 'application/json'
-                },
-                body: JSON.stringify(request)
-            })
-            .then(response => response.json())
-            .then(data => {
-                if (data.status === 'started') {
-                    currentEvaluationId = data.evaluation_id;
-                    connectWebSocket(data.evaluation_id);
-                    showProgress();
-                    disableStartButton();
-                } else {
-                    alert('Failed to start NovaEval: ' + data.message);
-                }
-            })
-            .catch(error => {
-                console.error('Error:', error);
-                alert('Failed to start NovaEval');
-            });
-        }
-        function connectWebSocket(evaluationId) {
-            const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
-            const wsUrl = `${protocol}//${window.location.host}/ws/${evaluationId}`;
-            websocket = new WebSocket(wsUrl);
-            websocket.onmessage = function(event) {
-                const data = JSON.parse(event.data);
-                handleWebSocketMessage(data);
-            };
-            websocket.onclose = function() {
-                console.log('WebSocket connection closed');
-            };
-            websocket.onerror = function(error) {
-                console.error('WebSocket error:', error);
-            };
-        }
-        function handleWebSocketMessage(data) {
-            switch (data.type) {
-                case 'progress':
-                    updateProgress(data.progress, data.current_step);
-                    break;
-                case 'log':
-                    addLogEntry(data);
-                    break;
-                case 'complete':
-                    showResults(data.results);
-                    enableStartButton();
-                    break;
-                case 'error':
-                    addLogEntry({
-                        level: 'ERROR',
-                        message: data.message,
-                        timestamp: new Date().toISOString()
-                    });
-                    enableStartButton();
-                    break;
-            }
-        }
-        function showProgress() {
-            document.getElementById('idleMessage').classList.add('hidden');
-            document.getElementById('progressSection').classList.remove('hidden');
-            clearLogs();
-        }
-        function updateProgress(progress, currentStep) {
-            document.getElementById('progressBar').style.width = progress + '%';
-            document.getElementById('progressPercent').textContent = Math.round(progress) + '%';
-            document.getElementById('currentStep').textContent = currentStep;
-        }
-        function addLogEntry(logData) {
-            const container = document.getElementById('logsContainer');
-            const entry = document.createElement('div');
-            entry.className = 'log-entry mb-1';
-            const timestamp = new Date(logData.timestamp).toLocaleTimeString();
-            const levelColor = {
-                'INFO': 'text-blue-400',
-                'SUCCESS': 'text-green-400',
-                'ERROR': 'text-red-400',
-                'DEBUG': 'text-yellow-400',
-                'WARNING': 'text-orange-400'
-            }[logData.level] || 'text-green-400';
-            entry.innerHTML = `
-                <span class="text-gray-500">[${timestamp}]</span>
-                <span class="${levelColor}">[${logData.level}]</span>
-                <span>${logData.message}</span>
-            `;
-            container.appendChild(entry);
-            container.scrollTop = container.scrollHeight;
-        }
-        function clearLogs() {
-            document.getElementById('logsContainer').innerHTML = '';
-        }
-        function showResults(results) {
-            const panel = document.getElementById('resultsPanel');
-            const content = document.getElementById('resultsContent');
-            let html = '<div class="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">';
-            // Show results for ALL selected models
-            selectedModels.forEach(modelId => {
-                const modelName = getModelName(modelId);
-                const modelResults = results[modelId] || {};
-                html += `
-                    <div class="border rounded-lg p-4 bg-gray-50">
-                        <h3 class="font-semibold text-gray-800 mb-3">${modelName}</h3>
-                        <div class="space-y-2">
-                `;
-                if (Object.keys(modelResults).length > 0) {
-                    Object.keys(modelResults).forEach(metric => {
-                        const value = modelResults[metric];
-                        html += `
-                            <div class="flex justify-between items-center">
-                                <span class="text-sm text-gray-600">${metric.toUpperCase()}</span>
-                                <span class="text-lg font-semibold text-gray-800">${value}</span>
-                            </div>
-                        `;
-                    });
-                } else {
-                    html += '<div class="text-sm text-gray-500">No results available</div>';
-                }
-                html += '</div></div>';
-            });
-            html += '</div>';
-            content.innerHTML = html;
-            panel.classList.remove('hidden');
-        }
-        function disableStartButton() {
-            const btn = document.getElementById('startBtn');
-            btn.disabled = true;
-            btn.innerHTML = '<i data-lucide="loader" class="w-4 h-4 inline mr-1 animate-spin"></i>Running NovaEval...';
-            lucide.createIcons();
-        }
-        function enableStartButton() {
-            const btn = document.getElementById('startBtn');
-            btn.disabled = false;
-            btn.innerHTML = '<i data-lucide="play" class="w-4 h-4 inline mr-1"></i>Start NovaEval';
-            lucide.createIcons();
-        }
-    </script>
-</body>
-</html>
-    """
-@app.get("/api/models")
-async def get_models():
-    """Get available models"""
-    log_request("get_models", {})
-    return {"models": HF_MODELS}
-@app.get("/api/datasets")
-async def get_datasets():
-    """Get available datasets"""
-    log_request("get_datasets", {})
-    return {"datasets": EVALUATION_DATASETS}
-@app.get("/api/metrics")
-async def get_metrics():
-    """Get available metrics"""
-    log_request("get_metrics", {})
-    return {"metrics": EVALUATION_METRICS}
-@app.get("/api/logs")
-async def get_request_logs():
-    """Get recent request logs"""
-    return {"logs": request_logs[-100:]}  # Return last 100 logs
-@app.post("/api/evaluate")
-async def start_evaluation(request: EvaluationRequest):
-    """Start a new NovaEval evaluation"""
-    evaluation_id = str(uuid.uuid4())
-    log_request("start_evaluation", {
-        "evaluation_id": evaluation_id,
-        "request": request.dict()
-    })
-    # Start evaluation in background
-    asyncio.create_task(run_novaeval_evaluation(evaluation_id, request))
-    return EvaluationResponse(
-        evaluation_id=evaluation_id,
-        status="started",
-        message="NovaEval evaluation started successfully"
-    )
-@app.get("/api/evaluation/{evaluation_id}")
-async def get_evaluation_status(evaluation_id: str):
-    """Get evaluation status"""
-    if evaluation_id not in active_evaluations:
-        raise HTTPException(status_code=404, detail="Evaluation not found")
-    log_request("get_evaluation_status", {"evaluation_id": evaluation_id})
-    return active_evaluations[evaluation_id]
-@app.websocket("/ws/{evaluation_id}")
-async def websocket_endpoint(websocket: WebSocket, evaluation_id: str):
-    """WebSocket endpoint for real-time updates"""
-    await websocket.accept()
-    websocket_connections[evaluation_id] = websocket
-    log_request("websocket_connect", {"evaluation_id": evaluation_id})
-    try:
-        while True:
-            # Keep connection alive
-            await asyncio.sleep(1)
-    except WebSocketDisconnect:
-        if evaluation_id in websocket_connections:
-            del websocket_connections[evaluation_id]
-        log_request("websocket_disconnect", {"evaluation_id": evaluation_id})
-@app.get("/api/health")
-async def health_check():
-    """Health check endpoint"""
-    return {
-        "status": "healthy",
-        "timestamp": datetime.now().isoformat(),
-        "service": "novaeval-platform",
-        "version": "4.0.0",
-        "framework": "NovaEval"
-    }
-if __name__ == "__main__":
-    logger.info("Starting NovaEval Platform v4.0.0")
-    logger.info("Framework: NovaEval")
-    logger.info("Models: Hugging Face")
-    logger.info("Features: Real evaluations, detailed logging, request/response tracking")
-    uvicorn.run(app, host="0.0.0.0", port=7860)

fixed-novaeval-space.zip ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b51898f4d5d22ec3dff47d34bc2a0e4a35be243938d1fefc505cf95fe8f96103
+size 127518

novaeval-space-deployment.zip ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3cb881e568838cea8305a35504ed324af2d936149f2438fa4e2aa8fa797e2920
+size 24411

package.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "name": "react-template",
+  "version": "0.1.0",
+  "private": true,
+  "dependencies": {
+    "@testing-library/dom": "^10.4.0",
+    "@testing-library/jest-dom": "^6.6.3",
+    "@testing-library/react": "^16.3.0",
+    "@testing-library/user-event": "^13.5.0",
+    "react": "^19.1.0",
+    "react-dom": "^19.1.0",
+    "react-scripts": "5.0.1",
+    "web-vitals": "^2.1.4"
+  },
+  "scripts": {
+    "start": "react-scripts start",
+    "build": "react-scripts build",
+    "test": "react-scripts test",
+    "eject": "react-scripts eject"
+  },
+  "eslintConfig": {
+    "extends": [
+      "react-app",
+      "react-app/jest"
+    ]
+  },
+  "browserslist": {
+    "production": [
+      ">0.2%",
+      "not dead",
+      "not op_mini all"
+    ],
+    "development": [
+      "last 1 chrome version",
+      "last 1 firefox version",
+      "last 1 safari version"
+    ]
+  }
+}

requirements.txt DELETED Viewed

@@ -1,6 +0,0 @@
-fastapi==0.116.0
-uvicorn==0.35.0
-websockets==15.0.1
-httpx==0.28.1
-pydantic==2.11.7