Push code to Hhuggingface space repo
Browse files- README.md +96 -0
- app.py +96 -0
- requirements.txt +2 -0
README.md
CHANGED
@@ -11,3 +11,99 @@ short_description: AI chatbot with a crafted personality (e.g., Wise Mentor)
|
|
11 |
---
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
14 |
+
|
15 |
+
# 🤖 Prompt-Engineered Persona Agent with Mini-RAG
|
16 |
+
|
17 |
+
This project is an agentic chatbot built with a quantized LLM (`Gemma 1B`) that behaves according to a customizable persona prompt. It features a lightweight Retrieval-Augmented Generation (RAG) system using **TF-IDF + FAISS**, and **dynamic context length estimation** to optimize inference time—perfectly suited for CPU-only environments like Hugging Face Spaces.
|
18 |
+
|
19 |
+
---
|
20 |
+
|
21 |
+
## 🚀 Features
|
22 |
+
|
23 |
+
* ✅ **Customizable Persona** via system prompt
|
24 |
+
* ✅ **Mini-RAG** using TF-IDF + FAISS to retrieve relevant past conversation
|
25 |
+
* ✅ **Efficient memory** — only top relevant chat history used
|
26 |
+
* ✅ **Dynamic context length** estimation speeds up response time
|
27 |
+
* ✅ Gradio-powered UI
|
28 |
+
* ✅ Runs on free CPU
|
29 |
+
|
30 |
+
---
|
31 |
+
|
32 |
+
## 🧠 How It Works
|
33 |
+
|
34 |
+
1. **User submits a query** along with a system persona prompt.
|
35 |
+
2. **Top-k similar past turns** are retrieved using FAISS over TF-IDF vectors.
|
36 |
+
3. Only **relevant chat history** is used to build the final prompt.
|
37 |
+
4. The LLM generates a response based on the combined system prompt, retrieved context, and current user message.
|
38 |
+
5. Context length (`n_ctx`) is dynamically estimated to minimize resource usage.
|
39 |
+
|
40 |
+
---
|
41 |
+
|
42 |
+
## 🧪 Example Personas
|
43 |
+
|
44 |
+
You can change the persona in the UI system prompt box:
|
45 |
+
|
46 |
+
* 📚 `"You are a wise academic advisor who offers up to 3 concise, practical suggestions."`
|
47 |
+
* 🧘 `"You are a calm mindfulness coach. Always reply gently and with encouragement."`
|
48 |
+
* 🕵️ `"You are an investigative assistant. Be logical, skeptical, and fact-focused."`
|
49 |
+
|
50 |
+
---
|
51 |
+
|
52 |
+
## 📦 Installation
|
53 |
+
|
54 |
+
**For local setup:**
|
55 |
+
|
56 |
+
```bash
|
57 |
+
git clone https://huggingface.co/spaces/YOUR_USERNAME/Prompt-Persona-Agent
|
58 |
+
cd Prompt-Persona-Agent
|
59 |
+
pip install -r requirements.txt
|
60 |
+
```
|
61 |
+
|
62 |
+
Create an environment variable:
|
63 |
+
|
64 |
+
```bash
|
65 |
+
export HF_TOKEN=your_huggingface_token
|
66 |
+
```
|
67 |
+
|
68 |
+
Then run:
|
69 |
+
|
70 |
+
```bash
|
71 |
+
python app.py
|
72 |
+
```
|
73 |
+
|
74 |
+
---
|
75 |
+
|
76 |
+
## 📁 Files
|
77 |
+
|
78 |
+
* `app.py`: Main application with chat + RAG + dynamic context
|
79 |
+
* `requirements.txt`: All Python dependencies
|
80 |
+
* `README.md`: This file
|
81 |
+
|
82 |
+
---
|
83 |
+
|
84 |
+
## 🛠️ Tech Stack
|
85 |
+
|
86 |
+
* [Gradio](https://gradio.app/)
|
87 |
+
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
|
88 |
+
* [FAISS](https://github.com/facebookresearch/faiss)
|
89 |
+
* [scikit-learn (TF-IDF)](https://scikit-learn.org/)
|
90 |
+
* [Gemma 1B IT GGUF](https://huggingface.co/google/gemma-1.1-1b-it-gguf)
|
91 |
+
|
92 |
+
---
|
93 |
+
|
94 |
+
## 📌 Limitations
|
95 |
+
|
96 |
+
* Basic TF-IDF + FAISS retrieval — can be extended with semantic embedding models.
|
97 |
+
* Not all LLMs strictly follow persona — prompt tuning helps but is not perfect.
|
98 |
+
* For longer-term memory, a database + summarizer would be better.
|
99 |
+
|
100 |
+
---
|
101 |
+
|
102 |
+
## 📤 Deploy to Hugging Face Spaces
|
103 |
+
|
104 |
+
> Uses only CPU, no paid GPU required.
|
105 |
+
|
106 |
+
Make sure your `HF_TOKEN` is set as a secret or environment variable in your Hugging Face Space.
|
107 |
+
|
108 |
+
---
|
109 |
+
|
app.py
ADDED
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import gradio as gr
|
3 |
+
from llama_cpp import Llama
|
4 |
+
from huggingface_hub import snapshot_download, login
|
5 |
+
from sklearn.feature_extraction.text import TfidfVectorizer
|
6 |
+
import fiass
|
7 |
+
import numpy as np
|
8 |
+
|
9 |
+
#--------------------MODEL SETUP--------------------
|
10 |
+
MODEL_REPO = "google/gemma-3-1b-it-qat-q4_0-gguf"
|
11 |
+
MODEL_PATH = "./gemma-3-1b-it-qat-q4_0/gemma-3-1b-it-q4_0.gguf"
|
12 |
+
MODEL_DIR = "./gemma-3-1b-it-qat-q4_0"
|
13 |
+
DEFAULT_SYSTEM_PROMPT = (
|
14 |
+
"You are a Wise Mentor. Speak in a calm and concise manner. "
|
15 |
+
"If asked for advice, give a maximum of 3 actionable steps. "
|
16 |
+
"Avoid unnecessary elaboration. Decline unethical or harmful requests."
|
17 |
+
)
|
18 |
+
|
19 |
+
# Huggingface Token and download
|
20 |
+
hf_token = os.environ.get("HF_TOKEN")
|
21 |
+
if not os.path.exists(MODEL_PATH):
|
22 |
+
if not hf_token:
|
23 |
+
raise ValueError("HF_TOKEN is missing. Set it as an environment variable")
|
24 |
+
|
25 |
+
login(hf_token)
|
26 |
+
snapshot_download(repo_id=MODEL_REPO, local_dir=MODEL_DIR, local_dir_use_symlinks=False)
|
27 |
+
|
28 |
+
#--------------------RAG SETUP------------------------
|
29 |
+
documents = [] # stores all chat turns
|
30 |
+
vectorizer = TfidfVectorizer()
|
31 |
+
index = None
|
32 |
+
|
33 |
+
def update_rag_index():
|
34 |
+
global index
|
35 |
+
if not documents:
|
36 |
+
return
|
37 |
+
vectors = vectorizer.fit_transform(documents).toarray().asype('float32')
|
38 |
+
index = fiass.IndexFlatL2(vectors.shape[1])
|
39 |
+
index.add(vectors)
|
40 |
+
|
41 |
+
def retrive_relvant_docs(query, k=2):
|
42 |
+
if not documents or index is None:
|
43 |
+
return ""
|
44 |
+
|
45 |
+
query_vac = vectorizer.transform([query]).toarray().astype('float32')
|
46 |
+
D, I = index.search(query_vac, k)
|
47 |
+
return "\n".join(documents[i] for i in I[0] if i < len(documents))
|
48 |
+
|
49 |
+
|
50 |
+
#-----------------------CONTEXT LENGTH ESTIMATION---------------------
|
51 |
+
def estimate_n_ctx(full_prompt, buffer = 500):
|
52 |
+
total_tokens = len(full_prompt.split())
|
53 |
+
return min(3500, total_tokens+buffer)
|
54 |
+
|
55 |
+
#-----------------------CHAT FUNCTION-----------------------
|
56 |
+
def chat(user_input, history, system_prompt):
|
57 |
+
relevent_context = retrive_relvant_docs(user_input)
|
58 |
+
formatted_turns = "".join([f"<user>{u}</user><bot>{b}</bot>" for u, b in relevent_context])
|
59 |
+
|
60 |
+
full_prompt = (
|
61 |
+
f"<s>[INST] <<SYS>>\n{system_prompt}\nContext:\n{relevent_context}\n<</SYS>>\n"
|
62 |
+
f"{formatted_turns}<user>{user_input}[/INST]"
|
63 |
+
)
|
64 |
+
|
65 |
+
# Dynamic estimate n_ctx
|
66 |
+
n_ctx = estimate_n_ctx(full_prompt=full_prompt)
|
67 |
+
|
68 |
+
llm = Llama(
|
69 |
+
model_path= MODEL_PATH,
|
70 |
+
n_ctx = n_ctx,
|
71 |
+
n_threads=2,
|
72 |
+
n_batch=128
|
73 |
+
)
|
74 |
+
|
75 |
+
output = llm(full_prompt, max_tokens=256, stop=["</s>", "<user>"])
|
76 |
+
bot_reply = output["choices"][0]["text"].strip()
|
77 |
+
|
78 |
+
documents.append(f"user: {user_input} bot: {bot_reply}")
|
79 |
+
update_rag_index()
|
80 |
+
|
81 |
+
history.append((user_input, bot_reply))
|
82 |
+
return "", history
|
83 |
+
|
84 |
+
#-----------------------UI---------------------
|
85 |
+
with gr.Blocks() as demo:
|
86 |
+
gr.Markdown("# 🤖 Persona Agent with Mini-RAG + Dynamic Context")
|
87 |
+
with gr.Row():
|
88 |
+
system_prompt_box = gr.Textbox(label="System Prompt", value=DEFAULT_SYSTEM_PROMPT, lines=3)
|
89 |
+
chatbot = gr.Chatbot()
|
90 |
+
msg = gr.Textbox(label="Your Message")
|
91 |
+
clear = gr.Button("🗑️ Clear")
|
92 |
+
|
93 |
+
msg.submit(chat, [msg, chatbot, system_prompt_box], [msg, chatbot])
|
94 |
+
clear.click(lambda: [], None, chatbot)
|
95 |
+
|
96 |
+
demo.launch()
|
requirements.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
gradio
|
2 |
+
llama-cpp-python
|