zstanjj commited on
Commit
b150e7a
·
verified ·
1 Parent(s): 9a772ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -11,7 +11,7 @@ license: apache-2.0
11
  We release the HTML pruner model used in **HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems**.
12
 
13
  <p align="left">
14
- Useful links: 📝 <a href="https://arxiv.org/abs/2411.02959" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/SlimPLM" target="_blank">Github</a>
15
  </p>
16
 
17
  We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
@@ -21,7 +21,7 @@ We propose HtmlRAG, which uses HTML instead of plain text as the format of exter
21
  - **Two-Step Block-Tree-Based HTML Pruning**: The block-tree-based HTML pruning consists of two steps, both of which are conducted on the block tree structure. The first pruning step uses a embedding model to calculate scores for blocks, while the second step uses a path generative model. The first step processes the result of lossless HTML cleaning, while the second step processes the result of the first pruning step.
22
 
23
 
24
- 🌹 If you use this model, please ✨star our **[GitHub repository](https://github.com/plageon/HTMLRAG)** to support us. Your star means a lot!
25
 
26
  ## 📦 Installation
27
 
 
11
  We release the HTML pruner model used in **HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems**.
12
 
13
  <p align="left">
14
+ Useful links: 📝 <a href="https://arxiv.org/abs/2411.02959" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/papers/2411.02959" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/HtmlRAG" target="_blank">Github</a>
15
  </p>
16
 
17
  We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
 
21
  - **Two-Step Block-Tree-Based HTML Pruning**: The block-tree-based HTML pruning consists of two steps, both of which are conducted on the block tree structure. The first pruning step uses a embedding model to calculate scores for blocks, while the second step uses a path generative model. The first step processes the result of lossless HTML cleaning, while the second step processes the result of the first pruning step.
22
 
23
 
24
+ 🌹 If you use this model, please ✨star our **[GitHub repository](https://github.com/plageon/HtmlRAG)** to support us. Your star means a lot!
25
 
26
  ## 📦 Installation
27