ujwal55 commited on
Commit
459ed9b
·
1 Parent(s): 3c77517

Push code to Hhuggingface space repo

Browse files
Files changed (3) hide show
  1. README.md +96 -0
  2. app.py +96 -0
  3. requirements.txt +2 -0
README.md CHANGED
@@ -11,3 +11,99 @@ short_description: AI chatbot with a crafted personality (e.g., Wise Mentor)
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ # 🤖 Prompt-Engineered Persona Agent with Mini-RAG
16
+
17
+ This project is an agentic chatbot built with a quantized LLM (`Gemma 1B`) that behaves according to a customizable persona prompt. It features a lightweight Retrieval-Augmented Generation (RAG) system using **TF-IDF + FAISS**, and **dynamic context length estimation** to optimize inference time—perfectly suited for CPU-only environments like Hugging Face Spaces.
18
+
19
+ ---
20
+
21
+ ## 🚀 Features
22
+
23
+ * ✅ **Customizable Persona** via system prompt
24
+ * ✅ **Mini-RAG** using TF-IDF + FAISS to retrieve relevant past conversation
25
+ * ✅ **Efficient memory** — only top relevant chat history used
26
+ * ✅ **Dynamic context length** estimation speeds up response time
27
+ * ✅ Gradio-powered UI
28
+ * ✅ Runs on free CPU
29
+
30
+ ---
31
+
32
+ ## 🧠 How It Works
33
+
34
+ 1. **User submits a query** along with a system persona prompt.
35
+ 2. **Top-k similar past turns** are retrieved using FAISS over TF-IDF vectors.
36
+ 3. Only **relevant chat history** is used to build the final prompt.
37
+ 4. The LLM generates a response based on the combined system prompt, retrieved context, and current user message.
38
+ 5. Context length (`n_ctx`) is dynamically estimated to minimize resource usage.
39
+
40
+ ---
41
+
42
+ ## 🧪 Example Personas
43
+
44
+ You can change the persona in the UI system prompt box:
45
+
46
+ * 📚 `"You are a wise academic advisor who offers up to 3 concise, practical suggestions."`
47
+ * 🧘 `"You are a calm mindfulness coach. Always reply gently and with encouragement."`
48
+ * 🕵️ `"You are an investigative assistant. Be logical, skeptical, and fact-focused."`
49
+
50
+ ---
51
+
52
+ ## 📦 Installation
53
+
54
+ **For local setup:**
55
+
56
+ ```bash
57
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/Prompt-Persona-Agent
58
+ cd Prompt-Persona-Agent
59
+ pip install -r requirements.txt
60
+ ```
61
+
62
+ Create an environment variable:
63
+
64
+ ```bash
65
+ export HF_TOKEN=your_huggingface_token
66
+ ```
67
+
68
+ Then run:
69
+
70
+ ```bash
71
+ python app.py
72
+ ```
73
+
74
+ ---
75
+
76
+ ## 📁 Files
77
+
78
+ * `app.py`: Main application with chat + RAG + dynamic context
79
+ * `requirements.txt`: All Python dependencies
80
+ * `README.md`: This file
81
+
82
+ ---
83
+
84
+ ## 🛠️ Tech Stack
85
+
86
+ * [Gradio](https://gradio.app/)
87
+ * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
88
+ * [FAISS](https://github.com/facebookresearch/faiss)
89
+ * [scikit-learn (TF-IDF)](https://scikit-learn.org/)
90
+ * [Gemma 1B IT GGUF](https://huggingface.co/google/gemma-1.1-1b-it-gguf)
91
+
92
+ ---
93
+
94
+ ## 📌 Limitations
95
+
96
+ * Basic TF-IDF + FAISS retrieval — can be extended with semantic embedding models.
97
+ * Not all LLMs strictly follow persona — prompt tuning helps but is not perfect.
98
+ * For longer-term memory, a database + summarizer would be better.
99
+
100
+ ---
101
+
102
+ ## 📤 Deploy to Hugging Face Spaces
103
+
104
+ > Uses only CPU, no paid GPU required.
105
+
106
+ Make sure your `HF_TOKEN` is set as a secret or environment variable in your Hugging Face Space.
107
+
108
+ ---
109
+
app.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ from llama_cpp import Llama
4
+ from huggingface_hub import snapshot_download, login
5
+ from sklearn.feature_extraction.text import TfidfVectorizer
6
+ import fiass
7
+ import numpy as np
8
+
9
+ #--------------------MODEL SETUP--------------------
10
+ MODEL_REPO = "google/gemma-3-1b-it-qat-q4_0-gguf"
11
+ MODEL_PATH = "./gemma-3-1b-it-qat-q4_0/gemma-3-1b-it-q4_0.gguf"
12
+ MODEL_DIR = "./gemma-3-1b-it-qat-q4_0"
13
+ DEFAULT_SYSTEM_PROMPT = (
14
+ "You are a Wise Mentor. Speak in a calm and concise manner. "
15
+ "If asked for advice, give a maximum of 3 actionable steps. "
16
+ "Avoid unnecessary elaboration. Decline unethical or harmful requests."
17
+ )
18
+
19
+ # Huggingface Token and download
20
+ hf_token = os.environ.get("HF_TOKEN")
21
+ if not os.path.exists(MODEL_PATH):
22
+ if not hf_token:
23
+ raise ValueError("HF_TOKEN is missing. Set it as an environment variable")
24
+
25
+ login(hf_token)
26
+ snapshot_download(repo_id=MODEL_REPO, local_dir=MODEL_DIR, local_dir_use_symlinks=False)
27
+
28
+ #--------------------RAG SETUP------------------------
29
+ documents = [] # stores all chat turns
30
+ vectorizer = TfidfVectorizer()
31
+ index = None
32
+
33
+ def update_rag_index():
34
+ global index
35
+ if not documents:
36
+ return
37
+ vectors = vectorizer.fit_transform(documents).toarray().asype('float32')
38
+ index = fiass.IndexFlatL2(vectors.shape[1])
39
+ index.add(vectors)
40
+
41
+ def retrive_relvant_docs(query, k=2):
42
+ if not documents or index is None:
43
+ return ""
44
+
45
+ query_vac = vectorizer.transform([query]).toarray().astype('float32')
46
+ D, I = index.search(query_vac, k)
47
+ return "\n".join(documents[i] for i in I[0] if i < len(documents))
48
+
49
+
50
+ #-----------------------CONTEXT LENGTH ESTIMATION---------------------
51
+ def estimate_n_ctx(full_prompt, buffer = 500):
52
+ total_tokens = len(full_prompt.split())
53
+ return min(3500, total_tokens+buffer)
54
+
55
+ #-----------------------CHAT FUNCTION-----------------------
56
+ def chat(user_input, history, system_prompt):
57
+ relevent_context = retrive_relvant_docs(user_input)
58
+ formatted_turns = "".join([f"<user>{u}</user><bot>{b}</bot>" for u, b in relevent_context])
59
+
60
+ full_prompt = (
61
+ f"<s>[INST] <<SYS>>\n{system_prompt}\nContext:\n{relevent_context}\n<</SYS>>\n"
62
+ f"{formatted_turns}<user>{user_input}[/INST]"
63
+ )
64
+
65
+ # Dynamic estimate n_ctx
66
+ n_ctx = estimate_n_ctx(full_prompt=full_prompt)
67
+
68
+ llm = Llama(
69
+ model_path= MODEL_PATH,
70
+ n_ctx = n_ctx,
71
+ n_threads=2,
72
+ n_batch=128
73
+ )
74
+
75
+ output = llm(full_prompt, max_tokens=256, stop=["</s>", "<user>"])
76
+ bot_reply = output["choices"][0]["text"].strip()
77
+
78
+ documents.append(f"user: {user_input} bot: {bot_reply}")
79
+ update_rag_index()
80
+
81
+ history.append((user_input, bot_reply))
82
+ return "", history
83
+
84
+ #-----------------------UI---------------------
85
+ with gr.Blocks() as demo:
86
+ gr.Markdown("# 🤖 Persona Agent with Mini-RAG + Dynamic Context")
87
+ with gr.Row():
88
+ system_prompt_box = gr.Textbox(label="System Prompt", value=DEFAULT_SYSTEM_PROMPT, lines=3)
89
+ chatbot = gr.Chatbot()
90
+ msg = gr.Textbox(label="Your Message")
91
+ clear = gr.Button("🗑️ Clear")
92
+
93
+ msg.submit(chat, [msg, chatbot, system_prompt_box], [msg, chatbot])
94
+ clear.click(lambda: [], None, chatbot)
95
+
96
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gradio
2
+ llama-cpp-python