Spaces:
Sleeping
Sleeping
File size: 2,608 Bytes
923b896 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# RAG Benchmark Evaluation System
## Overview
This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models.
## Features
- Multiple LLM support (LLaMA 3.3, Mistral 7B)
- Various reranking models:
- MS MARCO MiniLM
- MS MARCO TinyBERT
- MonoT5 Base
- MonoT5 Small
- MonoT5 3B
- Vector similarity search using Milvus
- Automatic document chunking and retrieval
- Performance metrics calculation
- Interactive Gradio interface
## Prerequisites
- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)
## Installation
1. Clone the repository:
bash
git clone https://github.com/yourusername/rag-benchmark.git
cd rag-benchmark
2. Install dependencies:
- pip install -r requirements.txt
3. Configure the models:
- Create a `models` directory and add your language model files.
- Create a `rerankers` directory and add your reranking model files.
- Run the application:
- python app.py
## Usage
1. Start the application:
2. Access the web interface at `http://localhost:7860`
3. Enter your question and select:
- LLM Model (LLaMA 3.3 or Mistral 7B)
- Reranking Model (MS MARCO or MonoT5 variants)
4. Click "Evaluate Model" to get results
## Metrics
The system calculates several performance metrics:
- RMSE Context Relevance
- RMSE Context Utilization
- AUCROC Adherence
- Processing Time
## Reranking Models Comparison
### MS MARCO Models
- **MiniLM**: Fast and efficient, good general performance
- **TinyBERT**: Lightweight, slightly lower accuracy but faster
### MonoT5 Models
- **Small**: Compact and fast, suitable for limited resources
- **Base**: Balanced performance and speed
- **3B**: Highest accuracy, requires more computational resources
## Error Handling
- Automatic fallback to fewer documents if token limits are exceeded
- Graceful handling of API timeouts
- Comprehensive error logging
## Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## Dependencies
- gradio
- torch
- transformers
- sentence-transformers
- pymilvus
- numpy
- pandas
- scikit-learn
- tiktoken
- groq
- huggingface_hub
## License
[Your License Here]
## Acknowledgments
- RAGBench dataset
- Hugging Face Transformers
- Milvus Vector Database
- Groq API
|