legal-agent / README.md
Rajarshi-Roy-research
fixed errors 2
3769ae2
metadata
title: Legal-Agent
emoji: πŸŒ–
colorFrom: gray
colorTo: blue
sdk: docker
sdk_version: 5.18.0
app_file: app.py
pinned: false

Legal Agent Codebase

Table of Contents

(DEMO README)

About The Project

This projects aim to implements a sophisticated document retrieval and question answering system using LangChain, leveraging Google's Gemini-1.5-flash language model and a FAISS vector database. The system is designed to handle legal queries, providing comprehensive and accurate answers by combining information retrieval from a local knowledge base with web search capabilities when necessary.

Functionality

The system follows a multi-stage workflow:

  1. Query Input: The user provides a query (e.g., a legal question).

  2. FAISS Retrieval: The query is embedded using Google Generative AI embeddings, and the FAISS index (a local vector database) is queried to retrieve the most relevant documents.

  3. Grounded Response Generation: A DocSummarizerPipeline summarizes the retrieved documents, focusing on the user's query. This summary attempts to directly answer the question using only the retrieved documents.

  4. Response Evaluation: An IntermediateStateResponseEvaluator assesses the quality and completeness of the generated response. This evaluation uses the Gemini model to determine if the response sufficiently answers the query.

  5. Web Search (Conditional): If the generated response is deemed insufficient, a WebSearchAgent performs a web search using DuckDuckGo to gather additional information. The results are then incorporated into the final response.

  6. Response Output: The final answer, either from the document summary or the combined document/web search result, is returned to the user.

Code Structure

The code is organized into several classes and functions:

  • DocumentRetriever: Loads and interacts with the FAISS index, retrieving relevant documents based on a query.

  • FaissRetriever: Loads and interacts with the FAISS index, retrieving relevant documents based on a query.

  • DocSummarizerPipeline: Summarizes retrieved documents using the Gemini model, generating a concise answer focused on the user's query. It uses a carefully crafted prompt to ensure the response is structured and informative.

  • WebSearchAgent: Performs web searches using DuckDuckGo and integrates the results into the response.

  • IntermediateStateResponseEvaluator: Evaluates the quality of the generated response using the Gemini model, determining if additional information is needed.

  • State (TypedDict): Defines the data structure for passing information between stages of the workflow.

  • Workflow Functions (faiss_content_retriever, grounded_response, response_judge, web_response): These functions represent individual nodes in the LangGraph workflow.

  • StateGraph: Defines the workflow using LangGraph, managing the flow of data between the different stages. Conditional logic is implemented to determine whether a web search is necessary.

  • run_user_query: The main function that takes a user query and processes it through the LangGraph workflow.

    Agent Workflow:

alt text

Dependencies

The code relies on several libraries:

  • langgraph
  • langchain-core
  • langchain-google-genai
  • IPython
  • dotenv
  • google.generativeai
  • langchain.chains.question_answering
  • langchain.prompts
  • langchain.vectorstores
  • langchain_community.tools
  • langchain.agents

Working with the code

I have commented most of the neccesary information in the respective files.

To run this project locally, please follow these steps:-

  1. Clone the repository:

    git clone https://github.com/Rajarshi12321/legal-agent.git
    
  2. Create a Virtual Environment (Optional but recommended) It's a good practice to create a virtual environment to manage project dependencies. Run the following command:

    conda create -p <Environment_Name> python==<python version> -y
    

    Example:

    conda create -p venv python=3.9 -y 
    

    Note:

    • It is important to use python=3.9 or above for proper use of Langchain or else you would get unexpecterd errors
  3. Activate the Virtual Environment (Optional) Activate the virtual environment based on your operating system:

    conda activate <Environment_Name>/
    

    Example:

    conda activate venv/
    
  4. Install Dependencies

    • Run the following command to install project dependencies:
      pip install -r requirements.txt
      

    Ensure you have Python installed on your system (Python 3.9 or higher is recommended).
    Once the dependencies are installed, you're ready to use the project.

  5. Create a .env file in the root directory and add your Gemini and Langchain credentials as follows:

    GOOGLE_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    
  6. Run the Flask app: Execute the following code in your terminal.

    chainlit run app.py 
    
  7. Access the app: Open your web browser and navigate to http://localhost:8000/ to use the House Price Prediction and Property Recommendation app.

Deploying the project from your side (In AWS)

I have already made a github actions file in .github\workflows\main.yaml To use it you need to the following prerequisites:

  1. Make a IAM Role from your aws account
    1. Login to AWS console.
    1. Create IAM user for deployment

    #with specific access

    1. EC2 access : It is virtual machine

    2. ECR: Elastic Container registry to save your docker image in aws

    #Policy: (You need to select these policies when building the user)

    1. AmazonEC2ContainerRegistryFullAccess

    2. AmazonEC2FullAccess

  1. Building the full infrastructure using Terraform

    1st you need to configure your aws account using the created IAM role by the command aws configure so that terraform can know which account to use

    NOTE: If you don't want to use terraform for building infrastructure you can also build this manually from aws console:
    For reference watch this video from 3:47:20 time frame : Youtube link

    Get to the terraform directory: infrastructure\terraform and execute the following commands:

    Initializing Terraform

    terraform init 
    

    Forming a plan according the described infrastructure

    terraform plan 
    

    Applying the planned infrastructure to build necessary resources

    terraform apply -auto-approve
    



  2. After this you Need to configure your EC2 instance to install Docker:
    Run The Following commands:

    sudo apt-get update -y
    
    sudo apt-get upgrade
      
    
    curl -fsSL https://get.docker.com -o get-docker.sh
    
    sudo sh get-docker.sh
    
    sudo usermod -aG docker ubuntu
    
    newgrp docker
    
  3. After this you need to configure the self-runner for github actions to actually deploy it to EC2 instance:

    Check out the Youtube vidoe for reference from 3:54:38 time frame

    The commands for settinng up self-hosted runner will be like:

    (NOTE: Do use the commands from your actions runner, the below commands are just for your reference)

    mkdir actions-runner && cd actions-runner
    
    curl -o actions-runner-linux-x64-2.316.1.tar.gz -L https://github.com/actions/runner/releases/download/v2.316.1/actions-runner-linux-x64-2.316.1.tar.gz
    
    
    echo "d62de2400eeeacd195db91e2ff011bfb646cd5d85545e81d8f78c436183e09a8  actions-runner-linux-x64-2.316.1.tar.gz" | shasum -a 256 -c
    
    
    tar xzf ./actions-runner-linux-x64-2.316.1.tar.gz
    
    ./config.sh --url https://github.com/Rajarshi12321/main_app_deploy --token AWSY7XQOYHXWPQKGRAEQWRDGJD2GS
    
    ./run.sh
    

    name the runner as : self-hosted

  4. Follow the Following youtube video from 3:57:14 time frame to know which secret Key and Value to add to your github actions secrets. Additionlly you have to add the GOOGLE_API_KEY in the secrets to with same key name as used in .env and their api keys as the values.

  5. Finally after doing all this you can run you github actions smoothly which is run by the instructions of .github\workflows\main.yaml

    Description: About the deployment by main.yaml

    1. Build docker image of the source code

    2. Push your docker image to ECR

    3. Launch Your EC2

    4. Pull Your image from ECR in EC2

    5. Lauch your docker image in EC2

Now making any changes in any file except the readme.md file and assets folder (which contains images for readme) will trigger the github action CI/CD pipeline for development.

NOTE: Do keep an eye on the state of the self-hosted runner, if its idle or offline then fix the condition my connecting to ec2 instance and run the run.sh file by:

cd actions-runner

./run.sh

Contributing

I welcome contributions to improve the functionality and performance of the app. If you'd like to contribute, please follow these guidelines:

  1. Fork the repository and create a new branch for your feature or bug fix.

  2. Make your changes and ensure that the code is well-documented.

  3. Test your changes thoroughly to maintain app reliability.

  4. Create a pull request, detailing the purpose and changes made in your contribution.

Contact

Rajarshi Roy - [email protected]

License

This project is licensed under the MIT License. Feel free to modify and distribute it as per the terms of the license.

I hope this README provides you with the necessary information to get started with the road to Generative AI with Google Gemini and Langchain.