Spaces:

vonliechti
/

SQuAD_Agent_Experiment

Running

App Files Files Community

vonliechti commited on Oct 11, 2024

Commit

60d9d3a

verified ·

1 Parent(s): 1f1b1c4

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

.github/workflows/update_space.yml +28 -0
.gitignore +169 -0
README.md +59 -10
app.py +88 -0
bots.py +70 -0
data.py +80 -0
prompts.py +108 -0
run.py +30 -0
test_bots.py +14 -0
tools/squad_retriever.py +30 -0
tools/text_to_image.py +13 -0
tools/visual_qa.py +191 -0
tools/web_surfer.py +205 -0
utils.py +67 -0

.github/workflows/update_space.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: Run Python script
+on:
+  push:
+    branches:
+      - main
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout
+      uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.9'
+    - name: Install Gradio
+      run: python -m pip install gradio
+    - name: Log in to Hugging Face
+      run: python -c 'import huggingface_hub; huggingface_hub.login(token="${{ secrets.hf_token }}")'
+    - name: Deploy to Spaces
+      run: gradio deploy

.gitignore ADDED Viewed

	@@ -0,0 +1,169 @@

+# MacOS
+.DS_Store
+# Data
+chroma_db/
+data/
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
+.pdm.toml
+.pdm-python
+.pdm-build/
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/

README.md CHANGED Viewed

@@ -1,14 +1,63 @@
 ---
-title: SQuAD Agent Experiment
-emoji: 👁
-colorFrom: gray
-colorTo: pink
-sdk: gradio
-sdk_version: 5.0.1
 app_file: app.py
-pinned: false
-license: apache-2.0
-short_description: SQuAD Question Answering Agent
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: SQuAD_Agent_Experiment
 app_file: app.py
+sdk: gradio
+sdk_version: 4.44.0
 ---
+# SQuAD_Agent_Experiment
+## Overview
+The project is built using Transformers Agents 2.0, and uses the Stanford SQuAD dataset for training. The chatbot is designed to answer questions about the dataset, while also incorporating conversational context and various tools to provide a more natural and engaging conversational experience.
+## Getting Started
+1. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+1. Set up required keys:
+```bash
+HUGGINGFACE_API_TOKEN=<your token>
+```
+1. Run the app:
+```bash
+python app.py
+```
+## Methods Used
+1. SQuAD Dataset: The dataset used for training the chatbot is the Stanford SQuAD dataset, which contains over 100,000 questions and answers extracted from 500+ articles.
+2. RAG: RAG is a technique used to improve the accuracy of chatbots by using a custom knowledge base. In this project, the Stanford SQuAD dataset is used as the knowledge base.
+3. Llama 3.1: Llama 3.1 is a large language model used to generate responses to user questions. It is used in this project to generate responses to user questions, while also incorporating conversational context.
+4. Transformers Agents 2.0: Transformers Agents 2.0 is a framework for building conversational AI systems. It is used in this project to build the chatbot.
+5. Created a SquadRetrieverTool to integrate a fine-tuned BERT model into the agent, along with a TextToImageTool for a playful way to engage with the question-answering agent.
+## Evaluation
+* [Agent Reasoning Benchmark](https://github.com/aymeric-roucher/agent_reasoning_benchmark)
+* [Hugging Face Blog: Open Source LLMs as Agents](https://huggingface.co/blog/open-source-llms-as-agents)
+* [Benchmarking Transformers Agents](https://github.com/aymeric-roucher/agent_reasoning_benchmark/blob/main/benchmark_transformers_agents.ipynb)
+## Results
+TBD
+## Limitations
+TBD
+## Future Work
+TBD
+## Acknowledgments
+* [MemGPT](https://github.com/cpacker/MemGPT)
+* [Stanford SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)
+* [GPT-4](https://openai.com/gpt-4/)

app.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import gradio as gr
+from gradio import ChatMessage
+from transformers import ReactCodeAgent, HfApiEngine
+from utils import stream_from_transformers_agent
+from prompts import SQUAD_REACT_CODE_SYSTEM_PROMPT
+from tools.squad_retriever import SquadRetrieverTool
+from tools.text_to_image import TextToImageTool
+from dotenv import load_dotenv
+load_dotenv()
+TASK_SOLVING_TOOLBOX = [
+    SquadRetrieverTool(),
+    TextToImageTool(),
+]
+model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
+# model_name = "http://localhost:1234/v1"
+llm_engine = HfApiEngine(model_name)
+# Initialize the agent with both tools
+agent = ReactCodeAgent(
+    tools=TASK_SOLVING_TOOLBOX,
+    llm_engine=llm_engine,
+    system_prompt=SQUAD_REACT_CODE_SYSTEM_PROMPT,
+)
+def append_example_message(x: gr.SelectData, messages):
+    if x.value["text"] is not None:
+        message = x.value["text"]
+    if "files" in x.value:
+        if isinstance(x.value["files"], list):
+            message = "Here are the files: "
+            for file in x.value["files"]:
+                message += f"{file}, "
+        else:
+            message = x.value["files"]
+    messages.append(ChatMessage(role="user", content=message))
+    return messages
+def add_message(message, messages):
+    messages.append(ChatMessage(role="user", content=message))
+    return messages
+def interact_with_agent(messages):
+    prompt = messages[-1]['content']
+    for msg in stream_from_transformers_agent(agent, prompt):
+        messages.append(msg)
+        yield messages
+    yield messages
+with gr.Blocks(fill_height=True) as demo:
+    chatbot = gr.Chatbot(
+        label="SQuAD Agent",
+        type="messages",
+        avatar_images=(
+            None,
+            "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
+        ),
+        scale=1,
+        bubble_full_width=False,
+        autoscroll=True,
+        show_copy_all_button=True,
+        show_copy_button=True,
+        placeholder="Enter a message",
+        examples=[
+            {
+                "text": "What is on top of the Notre Dame building?",
+            },
+            {
+                "text": "Tell me what's on top of the Notre Dame building, and draw a picture of it.",
+            },
+            {
+                "text": "Draw a picture of whatever is on top of the Notre Dame building.",
+            },
+        ],
+    )
+    text_input = gr.Textbox(lines=1, label="Chat Message", scale=0)
+    chat_msg = text_input.submit(add_message, [text_input, chatbot], [chatbot])
+    bot_msg = chat_msg.then(interact_with_agent, [chatbot], [chatbot])
+    text_input.submit(lambda: "", None, text_input)
+    chatbot.example_select(append_example_message, [chatbot], [chatbot]).then(
+        interact_with_agent, [chatbot], [chatbot]
+    )
+if __name__ == "__main__":
+    demo.launch()

bots.py ADDED Viewed

	@@ -0,0 +1,70 @@

+from data import Data
+'''
+The BotWrapper class makes it so that different types of bots can be used in the same way.
+This is used in the Bots class to create a list of all bots and pass them to the frontend.
+'''
+class BotWrapper:
+    def __init__(self, bot):
+        self.bot = bot
+    def chat(self, *args, **kwargs):
+        methods = ['chat', 'query']
+        for method in methods:
+            if hasattr(self.bot, method):
+                print(f"Calling {method} method")
+                method_to_call = getattr(self.bot, method)
+                return method_to_call(*args, **kwargs).response()
+        raise AttributeError(f"'{self.bot.__class__.__name__}' object has none of the required methods: '{methods}'")
+    def stream_chat(self, *args, **kwargs):
+        methods = ['stream_chat', 'query']
+        for method in methods:
+            if hasattr(self.bot, method):
+                print(f"Calling {method} method")
+                method_to_call = getattr(self.bot, method)
+                return method_to_call(*args, **kwargs).response_gen
+        raise AttributeError(f"'{self.bot.__class__.__name__}' object has none of the required methods: '{methods}'")
+'''
+The Bots class creates the bots and passes them to the frontend.
+'''
+class Bots:
+    def __init__(self):
+        self.data = Data()
+        self.data.load_data()
+        self.query_engine = None
+        self.chat_agent = None
+        self.all_bots = None
+        self.create_bots()
+    def create_query_engine_bot(self):
+        if self.query_engine is None:
+            self.query_engine = BotWrapper(self.data.index.as_query_engine())
+        return self.query_engine
+    def create_chat_agent(self):
+        if self.chat_agent is None:
+            from llama_index.core.memory import ChatMemoryBuffer
+            memory = ChatMemoryBuffer.from_defaults(token_limit=1500)
+            self.chat_agent = BotWrapper(self.data.index.as_chat_engine(
+                chat_mode="context",
+                memory=memory,
+                context_prompt=(
+                    "You are a chatbot, able to have normal interactions, as well as talk"
+                    " about the questions and answers you know about."
+                    "Here are the relevant documents for the context:\n"
+                    "{context_str}"
+                    "\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
+                )
+            ))
+        return self.chat_agent
+    def create_bots(self):
+        self.create_query_engine_bot()
+        self.create_chat_agent()
+        self.all_bots = [self.query_engine, self.chat_agent]
+        return self.all_bots
+    def get_bots(self):
+        return self.all_bots

data.py ADDED Viewed

	@@ -0,0 +1,80 @@

+import os
+import json
+import chromadb
+from llama_index.core import VectorStoreIndex
+from llama_index.vector_stores.chroma import ChromaVectorStore
+from llama_index.core import StorageContext
+from llama_index.core import Document
+from dotenv import load_dotenv
+load_dotenv()  # Load OPENAI_API_KEY from .env (not included in repo)
+class Data:
+    def __init__(self):
+        self.client = None
+        self.collection = None
+        self.index = None
+        self.load_data()
+    def load_data(self):
+        print("Loading data...")
+        with open('data/train-v1.1.json', 'r') as f:
+            raw_data = json.load(f)
+        extracted_question = []
+        extracted_answer = []
+        for data in raw_data['data']:
+            for par in data['paragraphs']:
+                for qa in par['qas']:
+                    for ans in qa['answers']:
+                        extracted_question.append(qa['question'])
+                        extracted_answer.append(ans['text'])
+        documents = []
+        for i in range(len(extracted_question)):
+            documents.append(f"Question: {extracted_question[i]} \nAnswer: {extracted_answer[i]}")
+        self.documents = [Document(text=t) for t in documents]
+        self.extracted_question = extracted_question
+        self.extracted_answer = extracted_answer
+        print("Raw Data loaded")
+        if not os.path.exists("./chroma_db"):
+            print("Creating Chroma DB...")
+            # initialize client, setting path to save data
+            self.client = chromadb.PersistentClient(path="./chroma_db")
+            # create collection
+            self.collection = self.client.get_or_create_collection("simple_index")
+            # assign chroma as the vector_store to the context
+            vector_store = ChromaVectorStore(chroma_collection=self.collection)
+            storage_context = StorageContext.from_defaults(vector_store=vector_store)
+            # create your index
+            self.index = VectorStoreIndex.from_documents(
+                self.documents, storage_context=storage_context
+            )
+            print("Chroma DB created")
+        else:
+            print("Chroma DB already exists")
+        print("Loading index...")
+        # initialize client
+        self.client = chromadb.PersistentClient(path="./chroma_db")
+        # get collection
+        self.collection = self.client.get_or_create_collection("simple_index")
+        # assign chroma as the vector_store to the context
+        vector_store = ChromaVectorStore(chroma_collection=self.collection)
+        storage_context = StorageContext.from_defaults(vector_store=vector_store)
+        # load your index from stored vectors
+        self.index = VectorStoreIndex.from_vector_store(
+            vector_store, storage_context=storage_context
+        )
+        print("Index loaded")

prompts.py ADDED Viewed

	@@ -0,0 +1,108 @@

+SQUAD_REACT_CODE_SYSTEM_PROMPT = """You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.
+To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.
+To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.
+Your most important tool is the `squad_retriever` tool,which can answer questions from the Stanford Question Answering Dataset (SQuAD).
+Not all questions will require the `squad_retriever` tool, but whenever you need to answer a question, you should start with this tool first, and then refine your answer only as needed to align with the question and chat history.
+At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.
+Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_action>' sequence.
+During each intermediate step, you can use 'print()' to save whatever important information you will then need.
+These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step.
+In the end you have to return a final answer using the `final_answer` tool.
+Here are a few examples using notional tools:
+---
+Task: "Generate an image of the oldest person in this document."
+Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
+Code:
+```py
+answer = document_qa(document=document, question="Who is the oldest person mentioned?")
+print(answer)
+```<end_action>
+Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."
+Thought: I will now generate an image showcasing the oldest person.
+Code:
+```py
+image = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")
+final_answer(image)
+```<end_action>
+---
+Task: "What is the result of the following operation: 5 + 3 + 1294.678?"
+Thought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool
+Code:
+```py
+result = 5 + 3 + 1294.678
+final_answer(result)
+```<end_action>
+---
+Task: "Which city has the highest population: Guangzhou or Shanghai?"
+Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.
+Code:
+```py
+population_guangzhou = search("Guangzhou population")
+print("Population Guangzhou:", population_guangzhou)
+population_shanghai = search("Shanghai population")
+print("Population Shanghai:", population_shanghai)
+```<end_action>
+Observation:
+Population Guangzhou: ['Guangzhou has a population of 15 million inhabitants as of 2021.']
+Population Shanghai: '26 million (2019)'
+Thought: Now I know that Shanghai has the highest population.
+Code:
+```py
+final_answer("Shanghai")
+```<end_action>
+---
+Task: "What is the current age of the pope, raised to the power 0.36?"
+Thought: I will use the tool `wiki` to get the age of the pope, then raise it to the power 0.36.
+Code:
+```py
+pope_age = wiki(query="current pope age")
+print("Pope age:", pope_age)
+```<end_action>
+Observation:
+Pope age: "The pope Francis is currently 85 years old."
+Thought: I know that the pope is 85 years old. Let's compute the result using python code.
+Code:
+```py
+pope_current_age = 85 ** 0.36
+final_answer(pope_current_age)
+```<end_action>
+Above example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you have access to those tools (and no other tool):
+<<tool_descriptions>>
+<<managed_agents_descriptions>>
+Here are the rules you should always follow to solve your task:
+1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```<end_action>' sequence, else you will fail.
+2. Use only variables that you have defined!
+3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'.
+4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.
+5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
+6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
+7. Never create any notional variables in our code, as having these in your logs might derail you from the true variables.
+8. You can use imports in your code, but only from the following list of modules: <<authorized_imports>>
+9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist.
+10. Don't give up! You're in charge of solving the task, not providing directions to solve it.
+11. Only use the tools that have been provided to you.
+12. Only generate an image when asked to do so.
+13. If the task questions the rationale of your previous answers, explain your rationale for the previous answers and attempt to correct any mistakes in your previous answers.
+As for your identity, your name is Agent SQuAD, you are an AI Agent, an expert guide to all questions and answers in the Stanford Question Answering Dataset (SQuAD), and you are SQuADtacular!
+Do not use the squad_retriever tool to answer questions about yourself, such as "what is your name" or "what are you".
+Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
+"""

run.py ADDED Viewed

	@@ -0,0 +1,30 @@

+import gradio as gr
+from gradio import ChatMessage
+from transformers import load_tool, ReactCodeAgent, HfEngine  # type: ignore
+from utils import stream_from_transformers_agent
+# Import tool from Hub
+image_generation_tool = load_tool("m-ric/text-to-image")
+llm_engine = HfEngine("meta-llama/Meta-Llama-3-70B-Instruct")
+# Initialize the agent with both tools
+agent = ReactCodeAgent(tools=[image_generation_tool], llm_engine=llm_engine)
+def interact_with_agent(prompt, messages):
+    messages.append(ChatMessage(role="user", content=prompt))
+    yield messages
+    for msg in stream_from_transformers_agent(agent, prompt):
+        messages.append(msg)
+        yield messages
+    yield messages
+with gr.Blocks() as demo:
+    stored_message = gr.State([])
+    chatbot = gr.Chatbot(label="Agent",
+                         type="messages",
+                         avatar_images=(None, "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png"))
+    text_input = gr.Textbox(lines=1, label="Chat Message")
+    text_input.submit(lambda s: (s, ""), [text_input], [stored_message, text_input]).then(interact_with_agent, [stored_message, chatbot], [chatbot])
+if __name__ == "__main__":
+    demo.launch()

test_bots.py ADDED Viewed

	@@ -0,0 +1,14 @@

+import pytest
+from deepeval import assert_test
+from deepeval.metrics import AnswerRelevancyMetric
+from deepeval.test_case import LLMTestCase
+def test_case():
+    answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)
+    test_case = LLMTestCase(
+        input="What if these shoes don't fit?",
+        # Replace this with the actual output from your LLM application
+        actual_output="We offer a 30-day full refund at no extra costs.",
+        retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
+    )
+    assert_test(test_case, [answer_relevancy_metric])

tools/squad_retriever.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from transformers.agents.tools import Tool
+from data import Data
+class SquadRetrieverTool(Tool):
+    name = "squad_retriever"
+    description = "Answers questions from the Stanford Question Answering Dataset (SQuAD)."
+    inputs = {
+        "query": {
+            "type": "string",
+            "description": "The question. This should be the literal question being asked, only modified to be informed by chat history. Be sure to pass this as a keyword argument and not a dictionary.",
+        },
+    }
+    output_type = "string"
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.data = Data()
+        self.query_engine = self.data.index.as_query_engine()
+    def forward(self, query: str) -> str:
+        assert isinstance(query, str), "Your search query must be a string"
+        response = self.query_engine.query(query)
+        # docs = self.data.index.similarity_search(query, k=3)
+        if len(response.response) == 0:
+            return "No answer found for this query."
+        return "Retrieved answer:\n\n" + "\n===Answer===\n".join(
+            [response.response]
+        )

tools/text_to_image.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from transformers.agents.tools import Tool
+from huggingface_hub import InferenceClient
+class TextToImageTool(Tool):
+    description = "This is a tool that creates an image according to a prompt, which is a text description."
+    name = "image_generator"
+    inputs = {"prompt": {"type": "string", "description": "The image generator prompt. Don't hesitate to add details in the prompt to make the image look better, like 'high-res, photorealistic', etc."}}
+    output_type = "image"
+    model_sdxl = "stabilityai/stable-diffusion-xl-base-1.0"
+    client = InferenceClient(model_sdxl)
+    def forward(self, prompt):
+        return self.client.text_to_image(prompt)

tools/visual_qa.py ADDED Viewed

	@@ -0,0 +1,191 @@

+from PIL import Image
+import base64
+from io import BytesIO
+import json
+import os
+import requests
+from typing import Optional
+from huggingface_hub import InferenceClient
+from transformers import AutoProcessor, Tool
+import uuid
+import mimetypes
+from dotenv import load_dotenv
+load_dotenv(override=True)
+idefics_processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
+def process_images_and_text(image_path, query, client):
+    messages = [
+        {
+            "role": "user", "content": [
+                {"type": "image"},
+                {"type": "text", "text": query},
+            ]
+        },
+    ]
+    prompt_with_template = idefics_processor.apply_chat_template(messages, add_generation_prompt=True)
+    # load images from local directory
+    # encode images to strings which can be sent to the endpoint
+    def encode_local_image(image_path):
+        # load image
+        image = Image.open(image_path).convert('RGB')
+        # Convert the image to a base64 string
+        buffer = BytesIO()
+        image.save(buffer, format="JPEG")  # Use the appropriate format (e.g., JPEG, PNG)
+        base64_image = base64.b64encode(buffer.getvalue()).decode('utf-8')
+        # add string formatting required by the endpoint
+        image_string = f"data:image/jpeg;base64,{base64_image}"
+        return image_string
+    image_string = encode_local_image(image_path)
+    prompt_with_images = prompt_with_template.replace("<image>", "![]({}) ").format(image_string)
+    payload = {
+        "inputs": prompt_with_images,
+        "parameters": {
+            "return_full_text": False,
+            "max_new_tokens": 200,
+        }
+    }
+    return json.loads(client.post(json=payload).decode())[0]
+# Function to encode the image
+def encode_image(image_path):
+    if image_path.startswith("http"):
+        user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
+        request_kwargs = {
+            "headers": {"User-Agent": user_agent},
+            "stream": True,
+        }
+        # Send a HTTP request to the URL
+        response = requests.get(image_path, **request_kwargs)
+        response.raise_for_status()
+        content_type = response.headers.get("content-type", "")
+        extension = mimetypes.guess_extension(content_type)
+        if extension is None:
+            extension = ".download"
+        fname = str(uuid.uuid4()) + extension
+        download_path = os.path.abspath(os.path.join("downloads", fname))
+        with open(download_path, "wb") as fh:
+            for chunk in response.iter_content(chunk_size=512):
+                fh.write(chunk)
+        image_path = download_path
+    with open(image_path, "rb") as image_file:
+        return base64.b64encode(image_file.read()).decode('utf-8')
+headers = {
+    "Content-Type": "application/json",
+    "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}"
+}
+def resize_image(image_path):
+    img = Image.open(image_path)
+    width, height = img.size
+    img = img.resize((int(width / 2), int(height / 2)))
+    new_image_path = f"resized_{image_path}"
+    img.save(new_image_path)
+    return new_image_path
+class VisualQATool(Tool):
+    name = "visualizer"
+    description = "A tool that can answer questions about attached images."
+    inputs = {
+        "question": {"description": "the question to answer", "type": "text"},
+        "image_path": {
+            "description": "The path to the image on which to answer the question",
+            "type": "text",
+        },
+    }
+    output_type = "text"
+    client = InferenceClient("HuggingFaceM4/idefics2-8b-chatty")
+    def forward(self, image_path: str, question: Optional[str] = None) -> str:
+        add_note = False
+        if not question:
+            add_note = True
+            question = "Please write a detailed caption for this image."
+        try:
+            output = process_images_and_text(image_path, question, self.client)
+        except Exception as e:
+            print(e)
+            if "Payload Too Large" in str(e):
+                new_image_path = resize_image(image_path)
+                output = process_images_and_text(new_image_path, question, self.client)
+        if add_note:
+            output = f"You did not provide a particular question, so here is a detailed caption for the image: {output}"
+        return output
+class VisualQAGPT4Tool(Tool):
+    name = "visualizer"
+    description = "A tool that can answer questions about attached images."
+    inputs = {
+        "question": {"description": "the question to answer", "type": "text"},
+        "image_path": {
+            "description": "The path to the image on which to answer the question. This should be a local path to downloaded image.",
+            "type": "text",
+        },
+    }
+    output_type = "text"
+    def forward(self, image_path: str, question: Optional[str] = None) -> str:
+        add_note = False
+        if not question:
+            add_note = True
+            question = "Please write a detailed caption for this image."
+        if not isinstance(image_path, str):
+            raise Exception("You should provide only one string as argument to this tool!")
+        base64_image = encode_image(image_path)
+        payload = {
+            "model": "gpt-4o",
+            "messages": [
+                {
+                "role": "user",
+                "content": [
+                    {
+                    "type": "text",
+                    "text": question
+                    },
+                    {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": f"data:image/jpeg;base64,{base64_image}"
+                    }
+                    }
+                ]
+                }
+            ],
+            "max_tokens": 500
+        }
+        response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
+        try:
+            output = response.json()['choices'][0]['message']['content']
+        except Exception:
+            raise Exception(f"Response format unexpected: {response.json()}")
+        if add_note:
+            output = f"You did not provide a particular question, so here is a detailed caption for the image: {output}"
+        return output

tools/web_surfer.py ADDED Viewed

	@@ -0,0 +1,205 @@

+# Shamelessly stolen from Microsoft Autogen team: thanks to them for this great resource!
+# https://github.com/microsoft/autogen/blob/gaia_multiagent_v01_march_1st/autogen/browser_utils.py
+import os
+import re
+from typing import Tuple, Optional
+from transformers.agents.agents import Tool
+import time
+from dotenv import load_dotenv
+import requests
+from pypdf import PdfReader
+from markdownify import markdownify as md
+import mimetypes
+from .browser import SimpleTextBrowser
+load_dotenv(override=True)
+user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
+browser_config = {
+    "viewport_size": 1024 * 5,
+    "downloads_folder": "coding",
+    "request_kwargs": {
+        "headers": {"User-Agent": user_agent},
+        "timeout": 300,
+    },
+}
+browser_config["serpapi_key"] = os.environ["SERPAPI_API_KEY"]
+browser = SimpleTextBrowser(**browser_config)
+# Helper functions
+def _browser_state() -> Tuple[str, str]:
+    header = f"Address: {browser.address}\n"
+    if browser.page_title is not None:
+        header += f"Title: {browser.page_title}\n"
+    current_page = browser.viewport_current_page
+    total_pages = len(browser.viewport_pages)
+    address = browser.address
+    for i in range(len(browser.history)-2,-1,-1): # Start from the second last
+        if browser.history[i][0] == address:
+            header += f"You previously visited this page {round(time.time() - browser.history[i][1])} seconds ago.\n"
+            break
+    header += f"Viewport position: Showing page {current_page+1} of {total_pages}.\n"
+    return (header, browser.viewport)
+class SearchInformationTool(Tool):
+    name="informational_web_search"
+    description="Perform an INFORMATIONAL web search query then return the search results."
+    inputs = {
+        "query": {
+            "type": "text",
+            "description": "The informational web search query to perform."
+        }
+    }
+    inputs["filter_year"]= {
+        "type": "text",
+        "description": "[Optional parameter]: filter the search results to only include pages from a specific year. For example, '2020' will only include pages from 2020. Make sure to use this parameter if you're trying to search for articles from a specific date!"
+    }
+    output_type = "text"
+    def forward(self, query: str, filter_year: Optional[int] = None) -> str:
+        browser.visit_page(f"google: {query}", filter_year=filter_year)
+        header, content = _browser_state()
+        return header.strip() + "\n=======================\n" + content
+class NavigationalSearchTool(Tool):
+    name="navigational_web_search"
+    description="Perform a NAVIGATIONAL web search query then immediately navigate to the top result. Useful, for example, to navigate to a particular Wikipedia article or other known destination. Equivalent to Google's \"I'm Feeling Lucky\" button."
+    inputs = {"query": {"type": "text", "description": "The navigational web search query to perform."}}
+    output_type = "text"
+    def forward(self, query: str) -> str:
+        browser.visit_page(f"google: {query}")
+        # Extract the first line
+        m = re.search(r"\[.*?\]\((http.*?)\)", browser.page_content)
+        if m:
+            browser.visit_page(m.group(1))
+        # Return where we ended up
+        header, content = _browser_state()
+        return header.strip() + "\n=======================\n" + content
+class VisitTool(Tool):
+    name="visit_page"
+    description="Visit a webpage at a given URL and return its text."
+    inputs = {"url": {"type": "text", "description": "The relative or absolute url of the webapge to visit."}}
+    output_type = "text"
+    def forward(self, url: str) -> str:
+        browser.visit_page(url)
+        header, content = _browser_state()
+        return header.strip() + "\n=======================\n" + content
+class DownloadTool(Tool):
+    name="download_file"
+    description="""
+Download a file at a given URL. The file should be of this format: [".xlsx", ".pptx", ".wav", ".mp3", ".png", ".docx"]
+After using this tool, for further inspection of this page you should return the download path to your manager via final_answer, and they will be able to inspect it.
+DO NOT use this tool for .pdf or .txt or .htm files: for these types of files use visit_page with the file url instead."""
+    inputs = {"url": {"type": "text", "description": "The relative or absolute url of the file to be downloaded."}}
+    output_type = "text"
+    def forward(self, url: str) -> str:
+        if "arxiv" in url:
+            url = url.replace("abs", "pdf")
+        response = requests.get(url)
+        content_type = response.headers.get("content-type", "")
+        extension = mimetypes.guess_extension(content_type)
+        if extension and isinstance(extension, str):
+            new_path = f"./downloads/file{extension}"
+        else:
+            new_path = "./downloads/file.object"
+        with open(new_path, "wb") as f:
+            f.write(response.content)
+        if "pdf" in extension or "txt" in extension or "htm" in extension:
+            raise Exception("Do not use this tool for pdf or txt or html files: use visit_page instead.")
+        return f"File was downloaded and saved under path {new_path}."
+class PageUpTool(Tool):
+    name="page_up"
+    description="Scroll the viewport UP one page-length in the current webpage and return the new viewport content."
+    output_type = "text"
+    def forward(self) -> str:
+        browser.page_up()
+        header, content = _browser_state()
+        return header.strip() + "\n=======================\n" + content
+class ArchiveSearchTool(Tool):
+    name="find_archived_url"
+    description="Given a url, searches the Wayback Machine and returns the archived version of the url that's closest in time to the desired date."
+    inputs={
+        "url": {"type": "text", "description": "The url you need the archive for."},
+        "date": {"type": "text", "description": "The date that you want to find the archive for. Give this date in the format 'YYYYMMDD', for instance '27 June 2008' is written as '20080627'."}
+    }
+    output_type = "text"
+    def forward(self, url, date) -> str:
+        archive_url = f"https://archive.org/wayback/available?url={url}&timestamp={date}"
+        response = requests.get(archive_url).json()
+        try:
+            closest = response["archived_snapshots"]["closest"]
+        except:
+            raise Exception(f"Your url was not archived on Wayback Machine, try a different url.")
+        target_url = closest["url"]
+        browser.visit_page(target_url)
+        header, content = _browser_state()
+        return f"Web archive for url {url}, snapshot taken at date {closest['timestamp'][:8]}:\n" + header.strip() + "\n=======================\n" + content
+class PageDownTool(Tool):
+    name="page_down"
+    description="Scroll the viewport DOWN one page-length in the current webpage and return the new viewport content."
+    output_type = "text"
+    def forward(self, ) -> str:
+        browser.page_down()
+        header, content = _browser_state()
+        return header.strip() + "\n=======================\n" + content
+class FinderTool(Tool):
+    name="find_on_page_ctrl_f"
+    description="Scroll the viewport to the first occurrence of the search string. This is equivalent to Ctrl+F."
+    inputs = {"search_string": {"type": "text", "description": "The string to search for on the page. This search string supports wildcards like '*'" }}
+    output_type = "text"
+    def forward(self, search_string: str) -> str:
+        find_result = browser.find_on_page(search_string)
+        header, content = _browser_state()
+        if find_result is None:
+            return header.strip() + f"\n=======================\nThe search string '{search_string}' was not found on this page."
+        else:
+            return header.strip() + "\n=======================\n" + content
+class FindNextTool(Tool):
+    name="find_next"
+    description="Scroll the viewport to next occurrence of the search string. This is equivalent to finding the next match in a Ctrl+F search."
+    inputs = {}
+    output_type = "text"
+    def forward(self, ) -> str:
+        find_result = browser.find_next()
+        header, content = _browser_state()
+        if find_result is None:
+            return header.strip() + "\n=======================\nThe search string was not found on this page."
+        else:
+            return header.strip() + "\n=======================\n" + content

utils.py ADDED Viewed

	@@ -0,0 +1,67 @@

+from __future__ import annotations
+from gradio import ChatMessage
+from transformers.agents import ReactCodeAgent, agent_types
+from typing import Generator
+def pull_message(step_log: dict):
+    if step_log.get("rationale"):
+        yield ChatMessage(
+            role="assistant",
+            metadata={"title": "🧠 Rationale"},
+            content=step_log["rationale"]
+        )
+    if step_log.get("tool_call"):
+        used_code = step_log["tool_call"]["tool_name"] == "code interpreter"
+        content = step_log["tool_call"]["tool_arguments"]
+        if used_code:
+            content = f"```py\n{content}\n```"
+        yield ChatMessage(
+            role="assistant",
+            metadata={"title": f"🛠️ Used tool {step_log['tool_call']['tool_name']}"},
+            content=content,
+        )
+    if step_log.get("observation"):
+        yield ChatMessage(
+            role="assistant",
+            metadata={"title": "👀 Observation"},
+            content=f"```\n{step_log['observation']}\n```"
+        )
+    if step_log.get("error"):
+        yield ChatMessage(
+            role="assistant",
+            metadata={"title": "💥 Error"},
+            content=str(step_log["error"]),
+        )
+def stream_from_transformers_agent(
+    agent: ReactCodeAgent, prompt: str,
+) -> Generator[ChatMessage, None, ChatMessage | None]:
+    """Runs an agent with the given prompt and streams the messages from the agent as ChatMessages."""
+    class Output:
+        output: agent_types.AgentType | str = None
+    step_log = None
+    for step_log in agent.run(prompt, stream=True, reset=len(agent.logs) == 0): # Reset=False misbehaves if the agent has not yet been run
+        if isinstance(step_log, dict):
+            for message in pull_message(step_log):
+                print("message", message)
+                yield message
+    Output.output = step_log
+    if isinstance(Output.output, agent_types.AgentText):
+        yield ChatMessage(
+            role="assistant", content=f"**Final answer:**\n```\n{Output.output.to_string()}\n```")  # type: ignore
+    elif isinstance(Output.output, agent_types.AgentImage):
+        yield ChatMessage(
+            role="assistant",
+            content={"path": Output.output.to_string(), "mime_type": "image/png"},  # type: ignore
+        )
+    elif isinstance(Output.output, agent_types.AgentAudio):
+        yield ChatMessage(
+            role="assistant",
+            content={"path": Output.output.to_string(), "mime_type": "audio/wav"},  # type: ignore
+        )
+    else:
+        return ChatMessage(role="assistant", content=Output.output)