File size: 2,608 Bytes
923b896
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# RAG Benchmark Evaluation System

## Overview

This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models.

## Features

- Multiple LLM support (LLaMA 3.3, Mistral 7B)
- Various reranking models:
  - MS MARCO MiniLM
  - MS MARCO TinyBERT
  - MonoT5 Base
  - MonoT5 Small
  - MonoT5 3B
- Vector similarity search using Milvus
- Automatic document chunking and retrieval
- Performance metrics calculation
- Interactive Gradio interface

## Prerequisites

- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)

## Installation

1. Clone the repository:
   bash
   git clone https://github.com/yourusername/rag-benchmark.git
   cd rag-benchmark

2. Install dependencies:

- pip install -r requirements.txt

3. Configure the models:

- Create a `models` directory and add your language model files.
- Create a `rerankers` directory and add your reranking model files.

- Run the application:

- python app.py

## Usage

1. Start the application:

2. Access the web interface at `http://localhost:7860`

3. Enter your question and select:

   - LLM Model (LLaMA 3.3 or Mistral 7B)
   - Reranking Model (MS MARCO or MonoT5 variants)

4. Click "Evaluate Model" to get results

## Metrics

The system calculates several performance metrics:

- RMSE Context Relevance
- RMSE Context Utilization
- AUCROC Adherence
- Processing Time

## Reranking Models Comparison

### MS MARCO Models

- **MiniLM**: Fast and efficient, good general performance
- **TinyBERT**: Lightweight, slightly lower accuracy but faster

### MonoT5 Models

- **Small**: Compact and fast, suitable for limited resources
- **Base**: Balanced performance and speed
- **3B**: Highest accuracy, requires more computational resources

## Error Handling

- Automatic fallback to fewer documents if token limits are exceeded
- Graceful handling of API timeouts
- Comprehensive error logging

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## Dependencies

- gradio
- torch
- transformers
- sentence-transformers
- pymilvus
- numpy
- pandas
- scikit-learn
- tiktoken
- groq
- huggingface_hub

## License

[Your License Here]

## Acknowledgments

- RAGBench dataset
- Hugging Face Transformers
- Milvus Vector Database
- Groq API