Spaces:

Noveumai
/

NovaEval

Sleeping

App Files Files Community

shashankagar commited on Jul 15

Commit

81e8991

verified ·

1 Parent(s): 900252d

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +5 -7
README.md +165 -56
app.py +0 -0
requirements.txt +5 -22

Dockerfile CHANGED Viewed

@@ -1,4 +1,3 @@
-# Comprehensive NovaEval Space Dockerfile
 FROM python:3.11-slim
 # Set working directory
@@ -6,18 +5,17 @@ WORKDIR /app
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
-    git \
-    build-essential \
     && rm -rf /var/lib/apt/lists/*
 # Copy requirements and install Python dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-# Copy application
-COPY app.py .
-# Create non-root user
 RUN useradd -m -u 1000 user
 USER user
@@ -28,6 +26,6 @@ EXPOSE 7860
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
     CMD curl -f http://localhost:7860/api/health || exit 1
-# Run application
 CMD ["python", "app.py"]

 FROM python:3.11-slim
 # Set working directory
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
+    curl \
     && rm -rf /var/lib/apt/lists/*
 # Copy requirements and install Python dependencies
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY advanced_novaeval_app.py app.py
+# Create non-root user for security
 RUN useradd -m -u 1000 user
 USER user
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
     CMD curl -f http://localhost:7860/api/health || exit 1
+# Run the application
 CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,73 +1,182 @@
----
-title: NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform
-emoji: 🚀
-colorFrom: blue
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
-app_port: 7860
----
-# NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform
-A comprehensive platform for evaluating AI language models using the NovaEval framework. Built by [Noveum.ai](https://noveum.ai) for the AI research community.
-## 🌟 Features
-### 🤖 Latest LLMs
-- **GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo** (OpenAI)
-- **Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku** (Anthropic)
-- **Amazon Titan, Cohere Command** (AWS Bedrock)
-- **Noveum AI Gateway** (Noveum.ai)
-### 📊 Comprehensive Datasets
-- **MMLU** - Massive Multitask Language Understanding
-- **HumanEval** - Code Generation Benchmark
-- **HellaSwag** - Commonsense Reasoning
-- **GSM8K** - Grade School Math
-- **TruthfulQA** - Truthfulness Assessment
-- **Custom Dataset Upload** - Bring your own data
-### ⚡ Advanced Analytics
-- **Real-time Evaluation Logs** - Live request/response monitoring
-- **Detailed Metrics** - Accuracy, F1-Score, BLEU, ROUGE, Semantic Similarity
-- **Interactive Visualizations** - Charts, comparisons, statistical analysis
-- **Export Results** - JSON, CSV formats
-### 🔧 Advanced Configuration
-- **Sample Size Control** - 10 to 1000 samples
-- **Model Parameters** - Temperature, max tokens, top-p
-- **Evaluation Settings** - Batch size, timeout, retry logic
-- **Cost Estimation** - Real-time cost tracking
-## 🚀 Quick Start
-1. **Select Models** - Choose up to 5 LLMs from different providers
-2. **Choose Dataset** - Pick from academic benchmarks or upload custom data
-3. **Pick Metrics** - Select evaluation metrics for your use case
-4. **Configure** - Set parameters and start evaluation
-5. **Analyze** - View real-time results and detailed analytics
-## 🔗 Links
 - **Noveum.ai**: [https://noveum.ai](https://noveum.ai)
-- **NovaEval GitHub**: [https://github.com/Noveum/NovaEval](https://github.com/Noveum/NovaEval)
-- **Documentation**: [NovaEval Docs](https://github.com/Noveum/NovaEval#readme)
-## 🛠️ Technical Details
-- **Framework**: NovaEval v0.3.3
-- **Backend**: FastAPI with WebSocket support
-- **Frontend**: Modern HTML5/CSS3/JavaScript
-- **Models**: OpenAI, Anthropic, AWS Bedrock, Noveum.ai APIs
-- **Deployment**: Docker on Hugging Face Spaces
-## 📝 License
-MIT License - See [LICENSE](https://github.com/Noveum/NovaEval/blob/main/LICENSE) for details.
 ---
-**Powered by NovaEval v0.3.3 | Built with ❤️ by [Noveum.ai](https://noveum.ai)**

+# NovaEval by Noveum.ai
+Advanced AI Model Evaluation Platform powered by Hugging Face Models
+## 🚀 Features
+### 🤖 **Comprehensive Model Selection**
+- **15+ Top Hugging Face Models** across different size categories
+- **Real-time Model Search** with provider filtering
+- **Detailed Model Information** including capabilities, size, and provider
+- **Size-based Filtering** (Small 1-3B, Medium 7B, Large 14B+)
+### 📊 **Rich Dataset Collection**
+- **11 Evaluation Datasets** covering reasoning, knowledge, math, code, and language
+- **Category-based Filtering** for easy dataset discovery
+- **Detailed Dataset Information** including sample counts and difficulty levels
+- **Popular Benchmarks** like MMLU, HellaSwag, GSM8K, HumanEval
+### ⚡ **Advanced Evaluation Engine**
+- **Real-time Progress Tracking** with WebSocket updates
+- **Live Evaluation Logs** showing detailed request/response data
+- **Multiple Metrics Support** (Accuracy, F1-Score, BLEU, ROUGE, Pass@K)
+- **Configurable Parameters** (sample size, temperature, max tokens)
+### 🎨 **Modern User Interface**
+- **Responsive Design** optimized for desktop and mobile
+- **Interactive Model Cards** with hover effects and selection states
+- **Real-time Configuration** with sliders and checkboxes
+- **Professional Gradient Design** with smooth animations
+## 🔧 **Technical Stack**
+- **Backend**: FastAPI + Python 3.11
+- **Frontend**: HTML5 + Tailwind CSS + Vanilla JavaScript
+- **Real-time**: WebSocket for live updates
+- **Models**: Hugging Face Inference API (free tier)
+- **Deployment**: Docker + Hugging Face Spaces
+## 📋 **Available Models**
+### Small Models (1-3B)
+- **FLAN-T5 Large** (0.8B) - Google
+- **Qwen 2.5 3B** (3B) - Alibaba
+- **Gemma 2B** (2B) - Google
+### Medium Models (7B)
+- **Qwen 2.5 7B** (7B) - Alibaba
+- **Mistral 7B** (7B) - Mistral AI
+- **DialoGPT Medium** (345M) - Microsoft
+- **CodeLlama 7B Python** (7B) - Meta
+### Large Models (14B+)
+- **Qwen 2.5 14B** (14B) - Alibaba
+- **Qwen 2.5 32B** (32B) - Alibaba
+- **Qwen 2.5 72B** (72B) - Alibaba
+## 📊 **Available Datasets**
+### Reasoning
+- **HellaSwag** - Commonsense reasoning (60K samples)
+- **CommonsenseQA** - Reasoning questions (12.1K samples)
+- **ARC** - Science reasoning (7.8K samples)
+### Knowledge
+- **MMLU** - Multitask understanding (231K samples)
+- **BoolQ** - Reading comprehension (12.7K samples)
+### Math
+- **GSM8K** - Grade school math (17.6K samples)
+- **AQUA-RAT** - Algebraic reasoning (196K samples)
+### Code
+- **HumanEval** - Python code generation (164 samples)
+- **MBPP** - Basic Python problems (1.4K samples)
+### Language
+- **IMDB Reviews** - Sentiment analysis (100K samples)
+- **CNN/DailyMail** - Summarization (936K samples)
+## 🎯 **Evaluation Metrics**
+- **Accuracy** - Percentage of correct predictions
+- **F1 Score** - Harmonic mean of precision and recall
+- **BLEU Score** - Text generation quality
+- **ROUGE Score** - Summarization quality
+- **Pass@K** - Code generation success rate
+## 🚀 **Quick Start**
+### Option 1: Direct Upload to Hugging Face Spaces
+1. Create a new Space on Hugging Face
+2. Choose "Docker" as the SDK
+3. Upload these files:
+   - `app.py` (renamed from `advanced_novaeval_app.py`)
+   - `requirements.txt`
+   - `Dockerfile`
+   - `README.md`
+4. Commit and push - your Space will build automatically!
+### Option 2: Local Development
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run the application
+python advanced_novaeval_app.py
+# Open browser to http://localhost:7860
+```
+## 🔧 **Configuration Options**
+### Model Parameters
+- **Sample Size**: 10-1000 samples
+- **Temperature**: 0.0-2.0 (creativity control)
+- **Max Tokens**: 128-2048 (response length)
+- **Top-p**: 0.9 (nucleus sampling)
+### Evaluation Settings
+- **Multiple Model Selection**: Compare up to 10 models
+- **Flexible Metrics**: Choose relevant metrics for your task
+- **Real-time Monitoring**: Watch evaluations progress live
+- **Export Results**: Download results in JSON format
+## 📱 **User Experience**
+### Workflow
+1. **Select Models** - Choose from 15+ Hugging Face models
+2. **Pick Dataset** - Select from 11 evaluation datasets
+3. **Configure Metrics** - Choose relevant evaluation metrics
+4. **Set Parameters** - Adjust sample size, temperature, etc.
+5. **Start Evaluation** - Watch real-time progress and logs
+6. **View Results** - Analyze performance comparisons
+### Features
+- **Model Search** - Find models by name or provider
+- **Category Filtering** - Filter by model size or dataset type
+- **Real-time Logs** - See actual evaluation steps
+- **Progress Tracking** - Visual progress bars and percentages
+- **Interactive Results** - Compare models side-by-side
+## 🌟 **Why NovaEval?**
+### For Researchers
+- **Comprehensive Benchmarking** across multiple models and datasets
+- **Standardized Evaluation** with consistent metrics and procedures
+- **Real-time Monitoring** to track evaluation progress
+- **Export Capabilities** for further analysis
+### For Developers
+- **Easy Integration** with Hugging Face ecosystem
+- **No API Keys Required** - uses free HF Inference API
+- **Modern Interface** with responsive design
+- **Detailed Logging** for debugging and analysis
+### For Teams
+- **Collaborative Evaluation** with shareable results
+- **Professional Interface** suitable for presentations
+- **Comprehensive Documentation** for easy onboarding
+- **Open Source** with full customization capabilities
+## 🔗 **Links**
 - **Noveum.ai**: [https://noveum.ai](https://noveum.ai)
+- **NovaEval Framework**: [https://github.com/Noveum/NovaEval](https://github.com/Noveum/NovaEval)
+- **Hugging Face Models**: [https://huggingface.co/models](https://huggingface.co/models)
+- **Documentation**: Available in the application interface
+## 📄 **License**
+This project is open source and available under the MIT License.
+## 🤝 **Contributing**
+We welcome contributions! Please see our contributing guidelines for more information.
 ---
+**Built with ❤️ by [Noveum.ai](https://noveum.ai) - Advancing AI Evaluation**

app.py CHANGED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt CHANGED Viewed

@@ -1,23 +1,6 @@
-# Comprehensive NovaEval Space Requirements
-fastapi>=0.104.0
-uvicorn[standard]>=0.24.0
-websockets>=12.0
-httpx>=0.25.0
-pydantic>=2.5.0
-python-multipart>=0.0.6
-# NovaEval and dependencies
-git+https://github.com/Noveum/NovaEval.git
-# Additional ML dependencies
-transformers>=4.35.0
-torch>=2.1.0
-datasets>=2.14.0
-evaluate>=0.4.0
-accelerate>=0.24.0
-tokenizers>=0.15.0
-# Optional: For better performance
-numpy>=1.24.0
-pandas>=2.0.0

+fastapi==0.116.0
+uvicorn==0.35.0
+websockets==15.0.1
+httpx==0.28.1
+pydantic==2.11.7