Spaces:

Noveumai
/

NovaEval

Sleeping

App Files Files Community

shashankagar commited on Jul 15

Commit

900252d

verified ·

1 Parent(s): 278a646

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +12 -11
README.md +47 -66
app.py +0 -0
requirements.txt +11 -8

Dockerfile CHANGED Viewed

@@ -1,32 +1,33 @@
-# Real NovaEval Space Dockerfile
 FROM python:3.11-slim
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
     git \
     build-essential \
     && rm -rf /var/lib/apt/lists/*
-WORKDIR /app
 # Copy requirements and install Python dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-# Copy application code
 COPY app.py .
-# Create directory for temporary files
-RUN mkdir -p /tmp/novaeval
 # Expose port
 EXPOSE 7860
-# Set environment variables
-ENV PYTHONUNBUFFERED=1
-ENV TRANSFORMERS_CACHE=/tmp/transformers_cache
-ENV HF_HOME=/tmp/hf_cache
-# Run the application
 CMD ["python", "app.py"]

+# Comprehensive NovaEval Space Dockerfile
 FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
     git \
     build-essential \
     && rm -rf /var/lib/apt/lists/*
 # Copy requirements and install Python dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
+# Copy application
 COPY app.py .
+# Create non-root user
+RUN useradd -m -u 1000 user
+USER user
 # Expose port
 EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:7860/api/health || exit 1
+# Run application
 CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: NovaEval - Real AI Model Evaluation Platform
-emoji: 🧪
 colorFrom: blue
 colorTo: purple
 sdk: docker
@@ -9,84 +9,65 @@ license: mit
 app_port: 7860
 ---
-# NovaEval - Real AI Model Evaluation Platform
-A comprehensive evaluation platform for AI models using the actual NovaEval framework with real evaluations and live logs.
-## 🚀 Features
-### Real Evaluations
-- **Actual NovaEval Integration**: Uses the genuine NovaEval v0.3.3 package
-- **Real Model Testing**: Authentic evaluation of Hugging Face models
-- **Live Progress Tracking**: WebSocket-based real-time updates
-- **Genuine Metrics**: Actual accuracy, F1-score, and BLEU calculations
-### Live Logging
-- **Real-time Logs**: Watch evaluation progress with live log streaming
-- **Detailed Progress**: Step-by-step evaluation process visibility
-- **Error Handling**: Comprehensive error reporting and recovery
-- **Performance Monitoring**: Real evaluation timing and resource usage
-### Supported Models
-- **DialoGPT Medium**: Microsoft's conversational AI model
-- **FLAN-T5 Base**: Google's instruction-tuned language model
-- **Mistral 7B Instruct**: High-performance instruction-following model
-### Evaluation Datasets
-- **MMLU**: Massive Multitask Language Understanding
-- **HellaSwag**: Commonsense reasoning evaluation
-- **HumanEval**: Code generation assessment
-### Metrics
-- **Accuracy**: Classification accuracy measurement
-- **F1-Score**: Balanced precision and recall evaluation
-- **BLEU Score**: Text generation quality assessment
-## 🔧 Technical Implementation
-### Backend
-- **FastAPI**: High-performance async web framework
-- **WebSocket**: Real-time bidirectional communication
-- **NovaEval**: Actual evaluation framework integration
-- **Transformers**: Hugging Face model loading and inference
-### Frontend
-- **Modern UI**: Beautiful gradient design with animations
-- **Responsive**: Mobile-friendly interface
-- **Real-time Updates**: Live progress and log streaming
-- **Interactive**: Dynamic model, dataset, and metric selection
-## 🎯 Usage
-1. **Select Models**: Choose up to 2 models for comparison
-2. **Pick Dataset**: Select evaluation dataset (MMLU, HellaSwag, HumanEval)
-3. **Choose Metrics**: Pick evaluation metrics (Accuracy, F1, BLEU)
-4. **Run Evaluation**: Start real evaluation with live progress tracking
-5. **View Results**: Analyze authentic evaluation results and comparisons
-## 🔍 Real Evaluation Process
-1. **Model Loading**: Actual Hugging Face model initialization
-2. **Dataset Preparation**: Real dataset loading and preprocessing
-3. **Evaluation Execution**: Genuine model inference and scoring
-4. **Metrics Calculation**: Authentic metric computation
-5. **Results Generation**: Real performance analysis and comparison
-## 📊 Live Features
-- **Progress Bar**: Real-time evaluation progress
-- **Log Streaming**: Live evaluation logs with timestamps
-- **Status Updates**: Current evaluation step and progress
-- **Error Reporting**: Detailed error messages and recovery
-- **Results Display**: Professional results visualization
-## 🌟 Advantages
-- **Authentic**: Real evaluations using actual NovaEval framework
-- **Transparent**: Live logs show exactly what's happening
-- **Reliable**: Robust error handling and recovery
-- **Educational**: Learn how real AI evaluation works
-- **Comparative**: Side-by-side model performance analysis
-Powered by [NovaEval](https://github.com/Noveum/NovaEval) and [Hugging Face](https://huggingface.co)

 ---
+title: NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform
+emoji: 🚀
 colorFrom: blue
 colorTo: purple
 sdk: docker
 app_port: 7860
 ---
+# NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform
+A comprehensive platform for evaluating AI language models using the NovaEval framework. Built by [Noveum.ai](https://noveum.ai) for the AI research community.
+## 🌟 Features
+### 🤖 Latest LLMs
+- **GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo** (OpenAI)
+- **Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku** (Anthropic)
+- **Amazon Titan, Cohere Command** (AWS Bedrock)
+- **Noveum AI Gateway** (Noveum.ai)
+### 📊 Comprehensive Datasets
+- **MMLU** - Massive Multitask Language Understanding
+- **HumanEval** - Code Generation Benchmark
+- **HellaSwag** - Commonsense Reasoning
+- **GSM8K** - Grade School Math
+- **TruthfulQA** - Truthfulness Assessment
+- **Custom Dataset Upload** - Bring your own data
+### ⚡ Advanced Analytics
+- **Real-time Evaluation Logs** - Live request/response monitoring
+- **Detailed Metrics** - Accuracy, F1-Score, BLEU, ROUGE, Semantic Similarity
+- **Interactive Visualizations** - Charts, comparisons, statistical analysis
+- **Export Results** - JSON, CSV formats
+### 🔧 Advanced Configuration
+- **Sample Size Control** - 10 to 1000 samples
+- **Model Parameters** - Temperature, max tokens, top-p
+- **Evaluation Settings** - Batch size, timeout, retry logic
+- **Cost Estimation** - Real-time cost tracking
+## 🚀 Quick Start
+1. **Select Models** - Choose up to 5 LLMs from different providers
+2. **Choose Dataset** - Pick from academic benchmarks or upload custom data
+3. **Pick Metrics** - Select evaluation metrics for your use case
+4. **Configure** - Set parameters and start evaluation
+5. **Analyze** - View real-time results and detailed analytics
+## 🔗 Links
+- **Noveum.ai**: [https://noveum.ai](https://noveum.ai)
+- **NovaEval GitHub**: [https://github.com/Noveum/NovaEval](https://github.com/Noveum/NovaEval)
+- **Documentation**: [NovaEval Docs](https://github.com/Noveum/NovaEval#readme)
+## 🛠️ Technical Details
+- **Framework**: NovaEval v0.3.3
+- **Backend**: FastAPI with WebSocket support
+- **Frontend**: Modern HTML5/CSS3/JavaScript
+- **Models**: OpenAI, Anthropic, AWS Bedrock, Noveum.ai APIs
+- **Deployment**: Docker on Hugging Face Spaces
+## 📝 License
+MIT License - See [LICENSE](https://github.com/Noveum/NovaEval/blob/main/LICENSE) for details.
+---
+**Powered by NovaEval v0.3.3 | Built with ❤️ by [Noveum.ai](https://noveum.ai)**

app.py CHANGED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt CHANGED Viewed

@@ -1,20 +1,23 @@
-# Real NovaEval Space Requirements
 fastapi>=0.104.0
 uvicorn[standard]>=0.24.0
-websockets>=11.0.0
 httpx>=0.25.0
-pydantic>=2.0.0
-# NovaEval and ML dependencies
-novaeval>=0.3.3
 transformers>=4.35.0
-torch>=2.0.0
 datasets>=2.14.0
 evaluate>=0.4.0
 accelerate>=0.24.0
-# Additional evaluation libraries
-scikit-learn>=1.3.0
 numpy>=1.24.0
 pandas>=2.0.0

+# Comprehensive NovaEval Space Requirements
 fastapi>=0.104.0
 uvicorn[standard]>=0.24.0
+websockets>=12.0
 httpx>=0.25.0
+pydantic>=2.5.0
+python-multipart>=0.0.6
+# NovaEval and dependencies
+git+https://github.com/Noveum/NovaEval.git
+# Additional ML dependencies
 transformers>=4.35.0
+torch>=2.1.0
 datasets>=2.14.0
 evaluate>=0.4.0
 accelerate>=0.24.0
+tokenizers>=0.15.0
+# Optional: For better performance
 numpy>=1.24.0
 pandas>=2.0.0