--- title: RagBenchCapstone10 emoji: 📉 colorFrom: green colorTo: yellow sdk: gradio sdk_version: 5.16.0 app_file: app.py pinned: false short_description: RagBench Dataset development by Saiteja --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # RAG Benchmark Evaluation System ## Overview This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models. ## Features - Multiple LLM support (LLaMA 3.3, Mistral 7B) - Various reranking models: - MS MARCO MiniLM - MS MARCO TinyBERT - MonoT5 Base - MonoT5 Small - MonoT5 3B - Vector similarity search using Milvus - Automatic document chunking and retrieval - Performance metrics calculation - Interactive Gradio interface ## Prerequisites - Python 3.8+ - CUDA-compatible GPU (optional, for faster processing) ## Installation 1. Clone the repository: bash git clone https://github.com/yourusername/rag-benchmark.git cd rag-benchmark 2. Install dependencies: - pip install -r requirements.txt 3. Configure the models: - Create a `models` directory and add your language model files. - Create a `rerankers` directory and add your reranking model files. - Run the application: - python app.py ## Usage 1. Start the application: 2. Access the web interface at `http://localhost:7860` 3. Enter your question and select: - LLM Model (LLaMA 3.3 or Mistral 7B) - Reranking Model (MS MARCO or MonoT5 variants) 4. Click "Evaluate Model" to get results ## Metrics The system calculates several performance metrics: - RMSE Context Relevance - RMSE Context Utilization - AUCROC Adherence - Processing Time ## Reranking Models Comparison ### MS MARCO Models - **MiniLM**: Fast and efficient, good general performance - **TinyBERT**: Lightweight, slightly lower accuracy but faster ### MonoT5 Models - **Small**: Compact and fast, suitable for limited resources - **Base**: Balanced performance and speed - **3B**: Highest accuracy, requires more computational resources ## Error Handling - Automatic fallback to fewer documents if token limits are exceeded - Graceful handling of API timeouts - Comprehensive error logging ## Contributing 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/AmazingFeature`) 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) 4. Push to the branch (`git push origin feature/AmazingFeature`) 5. Open a Pull Request ## Dependencies - gradio - torch - transformers - sentence-transformers - pymilvus - numpy - pandas - scikit-learn - tiktoken - groq - huggingface_hub ## License [Your License Here] ## Acknowledgments - RAGBench dataset - Hugging Face Transformers - Milvus Vector Database - Groq API