Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ license: apache-2.0
|
|
11 |
We release the HTML pruner model used in **HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems**.
|
12 |
|
13 |
<p align="left">
|
14 |
-
Useful links: 📝 <a href="https://arxiv.org/abs/2411.02959" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/
|
15 |
</p>
|
16 |
|
17 |
We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
|
@@ -21,7 +21,7 @@ We propose HtmlRAG, which uses HTML instead of plain text as the format of exter
|
|
21 |
- **Two-Step Block-Tree-Based HTML Pruning**: The block-tree-based HTML pruning consists of two steps, both of which are conducted on the block tree structure. The first pruning step uses a embedding model to calculate scores for blocks, while the second step uses a path generative model. The first step processes the result of lossless HTML cleaning, while the second step processes the result of the first pruning step.
|
22 |
|
23 |
|
24 |
-
🌹 If you use this model, please ✨star our **[GitHub repository](https://github.com/plageon/
|
25 |
|
26 |
## 📦 Installation
|
27 |
|
|
|
11 |
We release the HTML pruner model used in **HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems**.
|
12 |
|
13 |
<p align="left">
|
14 |
+
Useful links: 📝 <a href="https://arxiv.org/abs/2411.02959" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/papers/2411.02959" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/HtmlRAG" target="_blank">Github</a>
|
15 |
</p>
|
16 |
|
17 |
We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
|
|
|
21 |
- **Two-Step Block-Tree-Based HTML Pruning**: The block-tree-based HTML pruning consists of two steps, both of which are conducted on the block tree structure. The first pruning step uses a embedding model to calculate scores for blocks, while the second step uses a path generative model. The first step processes the result of lossless HTML cleaning, while the second step processes the result of the first pruning step.
|
22 |
|
23 |
|
24 |
+
🌹 If you use this model, please ✨star our **[GitHub repository](https://github.com/plageon/HtmlRAG)** to support us. Your star means a lot!
|
25 |
|
26 |
## 📦 Installation
|
27 |
|