--- title: NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform emoji: 🚀 colorFrom: blue colorTo: purple sdk: docker pinned: false license: mit app_port: 7860 --- # NovaEval by Noveum.ai - Advanced AI Model Evaluation Platform A comprehensive platform for evaluating AI language models using the NovaEval framework. Built by [Noveum.ai](https://noveum.ai) for the AI research community. ## 🌟 Features ### 🤖 Latest LLMs - **GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo** (OpenAI) - **Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku** (Anthropic) - **Amazon Titan, Cohere Command** (AWS Bedrock) - **Noveum AI Gateway** (Noveum.ai) ### 📊 Comprehensive Datasets - **MMLU** - Massive Multitask Language Understanding - **HumanEval** - Code Generation Benchmark - **HellaSwag** - Commonsense Reasoning - **GSM8K** - Grade School Math - **TruthfulQA** - Truthfulness Assessment - **Custom Dataset Upload** - Bring your own data ### ⚡ Advanced Analytics - **Real-time Evaluation Logs** - Live request/response monitoring - **Detailed Metrics** - Accuracy, F1-Score, BLEU, ROUGE, Semantic Similarity - **Interactive Visualizations** - Charts, comparisons, statistical analysis - **Export Results** - JSON, CSV formats ### 🔧 Advanced Configuration - **Sample Size Control** - 10 to 1000 samples - **Model Parameters** - Temperature, max tokens, top-p - **Evaluation Settings** - Batch size, timeout, retry logic - **Cost Estimation** - Real-time cost tracking ## 🚀 Quick Start 1. **Select Models** - Choose up to 5 LLMs from different providers 2. **Choose Dataset** - Pick from academic benchmarks or upload custom data 3. **Pick Metrics** - Select evaluation metrics for your use case 4. **Configure** - Set parameters and start evaluation 5. **Analyze** - View real-time results and detailed analytics ## 🔗 Links - **Noveum.ai**: [https://noveum.ai](https://noveum.ai) - **NovaEval GitHub**: [https://github.com/Noveum/NovaEval](https://github.com/Noveum/NovaEval) - **Documentation**: [NovaEval Docs](https://github.com/Noveum/NovaEval#readme) ## 🛠️ Technical Details - **Framework**: NovaEval v0.3.3 - **Backend**: FastAPI with WebSocket support - **Frontend**: Modern HTML5/CSS3/JavaScript - **Models**: OpenAI, Anthropic, AWS Bedrock, Noveum.ai APIs - **Deployment**: Docker on Hugging Face Spaces ## 📝 License MIT License - See [LICENSE](https://github.com/Noveum/NovaEval/blob/main/LICENSE) for details. --- **Powered by NovaEval v0.3.3 | Built with ❤️ by [Noveum.ai](https://noveum.ai)**