Spaces:

ujwal55
/

Prompt-Engineered_Persona_Agent

Running

App Files Files Community

ujwal55 commited on 24 days ago

Commit

459ed9b

1 Parent(s): 3c77517

Push code to Hhuggingface space repo

Browse files

Files changed (3) hide show

README.md +96 -0
app.py +96 -0
requirements.txt +2 -0

README.md CHANGED Viewed

@@ -11,3 +11,99 @@ short_description: AI chatbot with a crafted personality (e.g., Wise Mentor)
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# 🤖 Prompt-Engineered Persona Agent with Mini-RAG
+This project is an agentic chatbot built with a quantized LLM (`Gemma 1B`) that behaves according to a customizable persona prompt. It features a lightweight Retrieval-Augmented Generation (RAG) system using **TF-IDF + FAISS**, and **dynamic context length estimation** to optimize inference time—perfectly suited for CPU-only environments like Hugging Face Spaces.
+---
+## 🚀 Features
+* ✅ **Customizable Persona** via system prompt
+* ✅ **Mini-RAG** using TF-IDF + FAISS to retrieve relevant past conversation
+* ✅ **Efficient memory** — only top relevant chat history used
+* ✅ **Dynamic context length** estimation speeds up response time
+* ✅ Gradio-powered UI
+* ✅ Runs on free CPU
+---
+## 🧠 How It Works
+1. **User submits a query** along with a system persona prompt.
+2. **Top-k similar past turns** are retrieved using FAISS over TF-IDF vectors.
+3. Only **relevant chat history** is used to build the final prompt.
+4. The LLM generates a response based on the combined system prompt, retrieved context, and current user message.
+5. Context length (`n_ctx`) is dynamically estimated to minimize resource usage.
+---
+## 🧪 Example Personas
+You can change the persona in the UI system prompt box:
+* 📚 `"You are a wise academic advisor who offers up to 3 concise, practical suggestions."`
+* 🧘 `"You are a calm mindfulness coach. Always reply gently and with encouragement."`
+* 🕵️ `"You are an investigative assistant. Be logical, skeptical, and fact-focused."`
+---
+## 📦 Installation
+**For local setup:**
+```bash
+git clone https://huggingface.co/spaces/YOUR_USERNAME/Prompt-Persona-Agent
+cd Prompt-Persona-Agent
+pip install -r requirements.txt
+```
+Create an environment variable:
+```bash
+export HF_TOKEN=your_huggingface_token
+```
+Then run:
+```bash
+python app.py
+```
+---
+## 📁 Files
+* `app.py`: Main application with chat + RAG + dynamic context
+* `requirements.txt`: All Python dependencies
+* `README.md`: This file
+---
+## 🛠️ Tech Stack
+* [Gradio](https://gradio.app/)
+* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
+* [FAISS](https://github.com/facebookresearch/faiss)
+* [scikit-learn (TF-IDF)](https://scikit-learn.org/)
+* [Gemma 1B IT GGUF](https://huggingface.co/google/gemma-1.1-1b-it-gguf)
+---
+## 📌 Limitations
+* Basic TF-IDF + FAISS retrieval — can be extended with semantic embedding models.
+* Not all LLMs strictly follow persona — prompt tuning helps but is not perfect.
+* For longer-term memory, a database + summarizer would be better.
+---
+## 📤 Deploy to Hugging Face Spaces
+> Uses only CPU, no paid GPU required.
+Make sure your `HF_TOKEN` is set as a secret or environment variable in your Hugging Face Space.
+---

app.py ADDED Viewed

	@@ -0,0 +1,96 @@

+import os
+import gradio as gr
+from llama_cpp import Llama
+from huggingface_hub import snapshot_download, login
+from sklearn.feature_extraction.text import TfidfVectorizer
+import fiass
+import numpy as np
+#--------------------MODEL SETUP--------------------
+MODEL_REPO = "google/gemma-3-1b-it-qat-q4_0-gguf"
+MODEL_PATH = "./gemma-3-1b-it-qat-q4_0/gemma-3-1b-it-q4_0.gguf"
+MODEL_DIR = "./gemma-3-1b-it-qat-q4_0"
+DEFAULT_SYSTEM_PROMPT = (
+    "You are a Wise Mentor. Speak in a calm and concise manner. "
+    "If asked for advice, give a maximum of 3 actionable steps. "
+    "Avoid unnecessary elaboration. Decline unethical or harmful requests."
+)
+# Huggingface Token and download
+hf_token = os.environ.get("HF_TOKEN")
+if not os.path.exists(MODEL_PATH):
+    if not hf_token:
+        raise ValueError("HF_TOKEN is missing. Set it as an environment variable")
+    login(hf_token)
+    snapshot_download(repo_id=MODEL_REPO, local_dir=MODEL_DIR, local_dir_use_symlinks=False)
+#--------------------RAG SETUP------------------------
+documents = [] # stores all chat turns
+vectorizer = TfidfVectorizer()
+index = None
+def update_rag_index():
+    global index
+    if not documents:
+        return
+    vectors = vectorizer.fit_transform(documents).toarray().asype('float32')
+    index = fiass.IndexFlatL2(vectors.shape[1])
+    index.add(vectors)
+def retrive_relvant_docs(query, k=2):
+    if not documents or index is None:
+        return ""
+    query_vac = vectorizer.transform([query]).toarray().astype('float32')
+    D, I = index.search(query_vac, k)
+    return "\n".join(documents[i] for i in I[0] if i < len(documents))
+#-----------------------CONTEXT LENGTH ESTIMATION---------------------
+def estimate_n_ctx(full_prompt, buffer = 500):
+    total_tokens = len(full_prompt.split())
+    return min(3500, total_tokens+buffer)
+#-----------------------CHAT FUNCTION-----------------------
+def chat(user_input, history, system_prompt):
+    relevent_context = retrive_relvant_docs(user_input)
+    formatted_turns = "".join([f"<user>{u}</user><bot>{b}</bot>" for u, b in relevent_context])
+    full_prompt = (
+        f"<s>[INST] <<SYS>>\n{system_prompt}\nContext:\n{relevent_context}\n<</SYS>>\n"
+        f"{formatted_turns}<user>{user_input}[/INST]"
+    )
+    # Dynamic estimate n_ctx
+    n_ctx = estimate_n_ctx(full_prompt=full_prompt)
+    llm = Llama(
+        model_path= MODEL_PATH,
+        n_ctx = n_ctx,
+        n_threads=2,
+        n_batch=128
+    )
+    output = llm(full_prompt, max_tokens=256, stop=["</s>", "<user>"])
+    bot_reply = output["choices"][0]["text"].strip()
+    documents.append(f"user: {user_input} bot: {bot_reply}")
+    update_rag_index()
+    history.append((user_input, bot_reply))
+    return "", history
+#-----------------------UI---------------------
+with gr.Blocks() as demo:
+    gr.Markdown("# 🤖 Persona Agent with Mini-RAG + Dynamic Context")
+    with gr.Row():
+        system_prompt_box = gr.Textbox(label="System Prompt", value=DEFAULT_SYSTEM_PROMPT, lines=3)
+    chatbot = gr.Chatbot()
+    msg = gr.Textbox(label="Your Message")
+    clear = gr.Button("🗑️ Clear")
+    msg.submit(chat, [msg, chatbot, system_prompt_box], [msg, chatbot])
+    clear.click(lambda: [], None, chatbot)
+demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ gradio
2	+ llama-cpp-python