shashankagar commited on
Commit
900252d
·
verified ·
1 Parent(s): 278a646

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +12 -11
  2. README.md +47 -66
  3. app.py +0 -0
  4. requirements.txt +11 -8
Dockerfile CHANGED
@@ -1,32 +1,33 @@
1
- # Real NovaEval Space Dockerfile
2
  FROM python:3.11-slim
3
 
 
 
 
4
  # Install system dependencies
5
  RUN apt-get update && apt-get install -y \
6
  git \
7
  build-essential \
8
  && rm -rf /var/lib/apt/lists/*
9
 
10
- WORKDIR /app
11
-
12
  # Copy requirements and install Python dependencies
13
  COPY requirements.txt .
14
  RUN pip install --no-cache-dir -r requirements.txt
15
 
16
- # Copy application code
17
  COPY app.py .
18
 
19
- # Create directory for temporary files
20
- RUN mkdir -p /tmp/novaeval
 
21
 
22
  # Expose port
23
  EXPOSE 7860
24
 
25
- # Set environment variables
26
- ENV PYTHONUNBUFFERED=1
27
- ENV TRANSFORMERS_CACHE=/tmp/transformers_cache
28
- ENV HF_HOME=/tmp/hf_cache
29
 
30
- # Run the application
31
  CMD ["python", "app.py"]
32
 
 
1
+ # Comprehensive NovaEval Space Dockerfile
2
  FROM python:3.11-slim
3
 
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
  # Install system dependencies
8
  RUN apt-get update && apt-get install -y \
9
  git \
10
  build-essential \
11
  && rm -rf /var/lib/apt/lists/*
12
 
 
 
13
  # Copy requirements and install Python dependencies
14
  COPY requirements.txt .
15
  RUN pip install --no-cache-dir -r requirements.txt
16
 
17
+ # Copy application
18
  COPY app.py .
19
 
20
+ # Create non-root user
21
+ RUN useradd -m -u 1000 user
22
+ USER user
23
 
24
  # Expose port
25
  EXPOSE 7860
26
 
27
+ # Health check
28
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
29
+ CMD curl -f http://localhost:7860/api/health || exit 1
 
30
 
31
+ # Run application
32
  CMD ["python", "app.py"]
33
 
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: NovaEval - Real AI Model Evaluation Platform
3
- emoji: 🧪
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
@@ -9,84 +9,65 @@ license: mit
9
  app_port: 7860
10
  ---
11
 
12
- # NovaEval - Real AI Model Evaluation Platform
13
 
14
- A comprehensive evaluation platform for AI models using the actual NovaEval framework with real evaluations and live logs.
15
 
16
- ## 🚀 Features
17
 
18
- ### Real Evaluations
19
- - **Actual NovaEval Integration**: Uses the genuine NovaEval v0.3.3 package
20
- - **Real Model Testing**: Authentic evaluation of Hugging Face models
21
- - **Live Progress Tracking**: WebSocket-based real-time updates
22
- - **Genuine Metrics**: Actual accuracy, F1-score, and BLEU calculations
23
 
24
- ### Live Logging
25
- - **Real-time Logs**: Watch evaluation progress with live log streaming
26
- - **Detailed Progress**: Step-by-step evaluation process visibility
27
- - **Error Handling**: Comprehensive error reporting and recovery
28
- - **Performance Monitoring**: Real evaluation timing and resource usage
 
 
29
 
30
- ### Supported Models
31
- - **DialoGPT Medium**: Microsoft's conversational AI model
32
- - **FLAN-T5 Base**: Google's instruction-tuned language model
33
- - **Mistral 7B Instruct**: High-performance instruction-following model
 
34
 
35
- ### Evaluation Datasets
36
- - **MMLU**: Massive Multitask Language Understanding
37
- - **HellaSwag**: Commonsense reasoning evaluation
38
- - **HumanEval**: Code generation assessment
 
39
 
40
- ### Metrics
41
- - **Accuracy**: Classification accuracy measurement
42
- - **F1-Score**: Balanced precision and recall evaluation
43
- - **BLEU Score**: Text generation quality assessment
44
 
45
- ## 🔧 Technical Implementation
 
 
 
 
46
 
47
- ### Backend
48
- - **FastAPI**: High-performance async web framework
49
- - **WebSocket**: Real-time bidirectional communication
50
- - **NovaEval**: Actual evaluation framework integration
51
- - **Transformers**: Hugging Face model loading and inference
52
 
53
- ### Frontend
54
- - **Modern UI**: Beautiful gradient design with animations
55
- - **Responsive**: Mobile-friendly interface
56
- - **Real-time Updates**: Live progress and log streaming
57
- - **Interactive**: Dynamic model, dataset, and metric selection
58
 
59
- ## 🎯 Usage
60
 
61
- 1. **Select Models**: Choose up to 2 models for comparison
62
- 2. **Pick Dataset**: Select evaluation dataset (MMLU, HellaSwag, HumanEval)
63
- 3. **Choose Metrics**: Pick evaluation metrics (Accuracy, F1, BLEU)
64
- 4. **Run Evaluation**: Start real evaluation with live progress tracking
65
- 5. **View Results**: Analyze authentic evaluation results and comparisons
66
 
67
- ## 🔍 Real Evaluation Process
68
 
69
- 1. **Model Loading**: Actual Hugging Face model initialization
70
- 2. **Dataset Preparation**: Real dataset loading and preprocessing
71
- 3. **Evaluation Execution**: Genuine model inference and scoring
72
- 4. **Metrics Calculation**: Authentic metric computation
73
- 5. **Results Generation**: Real performance analysis and comparison
74
 
75
- ## 📊 Live Features
76
-
77
- - **Progress Bar**: Real-time evaluation progress
78
- - **Log Streaming**: Live evaluation logs with timestamps
79
- - **Status Updates**: Current evaluation step and progress
80
- - **Error Reporting**: Detailed error messages and recovery
81
- - **Results Display**: Professional results visualization
82
-
83
- ## 🌟 Advantages
84
-
85
- - **Authentic**: Real evaluations using actual NovaEval framework
86
- - **Transparent**: Live logs show exactly what's happening
87
- - **Reliable**: Robust error handling and recovery
88
- - **Educational**: Learn how real AI evaluation works
89
- - **Comparative**: Side-by-side model performance analysis
90
 
91
- Powered by [NovaEval](https://github.com/Noveum/NovaEval) and [Hugging Face](https://huggingface.co)
92
 
 
1
  ---
2
+ title: NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform
3
+ emoji: 🚀
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
 
9
  app_port: 7860
10
  ---
11
 
12
+ # NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform
13
 
14
+ A comprehensive platform for evaluating AI language models using the NovaEval framework. Built by [Noveum.ai](https://noveum.ai) for the AI research community.
15
 
16
+ ## 🌟 Features
17
 
18
+ ### 🤖 Latest LLMs
19
+ - **GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo** (OpenAI)
20
+ - **Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku** (Anthropic)
21
+ - **Amazon Titan, Cohere Command** (AWS Bedrock)
22
+ - **Noveum AI Gateway** (Noveum.ai)
23
 
24
+ ### 📊 Comprehensive Datasets
25
+ - **MMLU** - Massive Multitask Language Understanding
26
+ - **HumanEval** - Code Generation Benchmark
27
+ - **HellaSwag** - Commonsense Reasoning
28
+ - **GSM8K** - Grade School Math
29
+ - **TruthfulQA** - Truthfulness Assessment
30
+ - **Custom Dataset Upload** - Bring your own data
31
 
32
+ ### Advanced Analytics
33
+ - **Real-time Evaluation Logs** - Live request/response monitoring
34
+ - **Detailed Metrics** - Accuracy, F1-Score, BLEU, ROUGE, Semantic Similarity
35
+ - **Interactive Visualizations** - Charts, comparisons, statistical analysis
36
+ - **Export Results** - JSON, CSV formats
37
 
38
+ ### 🔧 Advanced Configuration
39
+ - **Sample Size Control** - 10 to 1000 samples
40
+ - **Model Parameters** - Temperature, max tokens, top-p
41
+ - **Evaluation Settings** - Batch size, timeout, retry logic
42
+ - **Cost Estimation** - Real-time cost tracking
43
 
44
+ ## 🚀 Quick Start
 
 
 
45
 
46
+ 1. **Select Models** - Choose up to 5 LLMs from different providers
47
+ 2. **Choose Dataset** - Pick from academic benchmarks or upload custom data
48
+ 3. **Pick Metrics** - Select evaluation metrics for your use case
49
+ 4. **Configure** - Set parameters and start evaluation
50
+ 5. **Analyze** - View real-time results and detailed analytics
51
 
52
+ ## 🔗 Links
 
 
 
 
53
 
54
+ - **Noveum.ai**: [https://noveum.ai](https://noveum.ai)
55
+ - **NovaEval GitHub**: [https://github.com/Noveum/NovaEval](https://github.com/Noveum/NovaEval)
56
+ - **Documentation**: [NovaEval Docs](https://github.com/Noveum/NovaEval#readme)
 
 
57
 
58
+ ## 🛠️ Technical Details
59
 
60
+ - **Framework**: NovaEval v0.3.3
61
+ - **Backend**: FastAPI with WebSocket support
62
+ - **Frontend**: Modern HTML5/CSS3/JavaScript
63
+ - **Models**: OpenAI, Anthropic, AWS Bedrock, Noveum.ai APIs
64
+ - **Deployment**: Docker on Hugging Face Spaces
65
 
66
+ ## 📝 License
67
 
68
+ MIT License - See [LICENSE](https://github.com/Noveum/NovaEval/blob/main/LICENSE) for details.
 
 
 
 
69
 
70
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
+ **Powered by NovaEval v0.3.3 | Built with ❤️ by [Noveum.ai](https://noveum.ai)**
73
 
app.py CHANGED
The diff for this file is too large to render. See raw diff
 
requirements.txt CHANGED
@@ -1,20 +1,23 @@
1
- # Real NovaEval Space Requirements
2
  fastapi>=0.104.0
3
  uvicorn[standard]>=0.24.0
4
- websockets>=11.0.0
5
  httpx>=0.25.0
6
- pydantic>=2.0.0
 
7
 
8
- # NovaEval and ML dependencies
9
- novaeval>=0.3.3
 
 
10
  transformers>=4.35.0
11
- torch>=2.0.0
12
  datasets>=2.14.0
13
  evaluate>=0.4.0
14
  accelerate>=0.24.0
 
15
 
16
- # Additional evaluation libraries
17
- scikit-learn>=1.3.0
18
  numpy>=1.24.0
19
  pandas>=2.0.0
20
 
 
1
+ # Comprehensive NovaEval Space Requirements
2
  fastapi>=0.104.0
3
  uvicorn[standard]>=0.24.0
4
+ websockets>=12.0
5
  httpx>=0.25.0
6
+ pydantic>=2.5.0
7
+ python-multipart>=0.0.6
8
 
9
+ # NovaEval and dependencies
10
+ git+https://github.com/Noveum/NovaEval.git
11
+
12
+ # Additional ML dependencies
13
  transformers>=4.35.0
14
+ torch>=2.1.0
15
  datasets>=2.14.0
16
  evaluate>=0.4.0
17
  accelerate>=0.24.0
18
+ tokenizers>=0.15.0
19
 
20
+ # Optional: For better performance
 
21
  numpy>=1.24.0
22
  pandas>=2.0.0
23