Update README.md
Browse files
README.md
CHANGED
@@ -85,6 +85,39 @@ All the benchmarks only assess the "trivial" mode on questions requiring some fo
|
|
85 |
## Deployment
|
86 |
The easiest way to deploy Pleias-RAG-1B is through [our official library](https://github.com/Pleias/Pleias-RAG-Library). It features an API-like workflow with standardized export of the structured reasoning/answer output into json format. A [Colab Notebook](https://colab.research.google.com/drive/1oG0qq0I1fSEV35ezSah-a335bZqmo4_7?usp=sharing) is available for easy tests and experimentations.
|
87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
|
89 |
|
90 |
We also release an [unquantized GGUF version](https://huggingface.co/PleIAs/Pleias-RAG-1B-gguf) for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
|
|
|
85 |
## Deployment
|
86 |
The easiest way to deploy Pleias-RAG-1B is through [our official library](https://github.com/Pleias/Pleias-RAG-Library). It features an API-like workflow with standardized export of the structured reasoning/answer output into json format. A [Colab Notebook](https://colab.research.google.com/drive/1oG0qq0I1fSEV35ezSah-a335bZqmo4_7?usp=sharing) is available for easy tests and experimentations.
|
87 |
|
88 |
+
A typical minimal example:
|
89 |
+
|
90 |
+
```python
|
91 |
+
rag = RAGWithCitations("PleIAs/Pleias-RAG-1B")
|
92 |
+
|
93 |
+
# Define query and sources
|
94 |
+
query = "What is the capital of France?"
|
95 |
+
sources = [
|
96 |
+
{
|
97 |
+
"text": "Paris is the capital and most populous city of France. With an estimated population of 2,140,526 residents as of January 2019, Paris is the center of the Île-de-France dijon metropolitan area and the hub of French economic, political, and cultural life. The city's landmarks, including the Eiffel Tower, Arc de Triomphe, and Cathedral of Notre-Dame, make it one of the world's most visited tourist destinations.",
|
98 |
+
"metadata": {"source": "Geographic Encyclopedia", "reliability": "high"}
|
99 |
+
},
|
100 |
+
{
|
101 |
+
"text": "The Eiffel Tower is located in Paris, France. It was constructed from 1887 to 1889 as the entrance to the 1889 World's Fair and was initially criticized by some of France's leading artists and intellectuals for its design. Standing at 324 meters (1,063 ft) tall, it was the tallest man-made structure in the world until the completion of the Chrysler Building in New York City in 1930. The tower receives about 7 million visitors annually and has become an iconic symbol of Paris and France.",
|
102 |
+
"metadata": {"source": "Travel Guide", "year": 2020}
|
103 |
+
}
|
104 |
+
]
|
105 |
+
|
106 |
+
# Generate a response
|
107 |
+
response = rag.generate(query, sources)
|
108 |
+
|
109 |
+
# Print the final answer with citations
|
110 |
+
print(response["processed"]["clean_answer"])
|
111 |
+
```
|
112 |
+
|
113 |
+
With expected output:
|
114 |
+
```
|
115 |
+
The capital of France is Paris. This is confirmed by multiple sources, with <|source_id|>1 explicitly stating that "Paris is the capital and most populous city of France"[1].
|
116 |
+
|
117 |
+
**Citations**
|
118 |
+
[1] "Paris is the capital and most populous city of France" [Source 1]
|
119 |
+
```
|
120 |
+
|
121 |
With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
|
122 |
|
123 |
We also release an [unquantized GGUF version](https://huggingface.co/PleIAs/Pleias-RAG-1B-gguf) for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
|