Olaemad's picture
Update README.md
e19b252 verified

A newer version of the Gradio SDK is available: 5.38.0

Upgrade
metadata
title: web search MCP-server
sdk: gradio
colorFrom: green
colorTo: green
short_description: MCP server for general and custom search on web
sdk_version: 5.34.0
tags:
  - mcp-server-track
app_file: app.py
pinned: true

Search Tool

Overview

Search Tool is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers.

Demo video

Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing Demo

Features

  • Custom Site Search: Search within a specified list of websites.
  • Custom Domain Search: Restrict searches to specific domains (e.g., .edu, .gov).
  • General Web Search: Perform open web searches.
  • Content Scraping: Extracts main textual content from URLs using trafilatura.
  • AI Analysis: Summarizes and analyzes scraped content using OpenAI models.
  • Validation: Ensures URLs are valid before processing.
  • Extensible Architecture: Easily add new searchers, scrapers, or analyzers.

Project Structure

search_tool/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ analyzer/         # AI-powered analyzers (e.g., OpenAI)
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ factory/      # Factories for searcher, scraper,    
β”‚   β”‚   β”œβ”€β”€ interface/    # Abstract interfaces for extensibility
β”‚   β”‚   └── types.py      # Enums and constants
β”‚   β”œβ”€β”€ mcp_servers/      # MCP server integration
β”‚   β”œβ”€β”€ models/           # Pydantic models for data validation
β”‚   β”œβ”€β”€ scraper/          # Web scrapers (e.g., Trafilatura)
β”‚   β”œβ”€β”€ searcher/         # Search engine integrations
β”‚   β”œβ”€β”€ tools/            # User-facing tool functions
β”‚   └── utils/            # Utility functions (e.g., URL validation)
β”œβ”€β”€ test.py               # Example/test script
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ pyproject.toml        # Project metadata and dependencies
β”œβ”€β”€ .env                  # Environment variables (e.g., API keys)
└── README.md             # Project documentation

Installation

  1. Clone the repository:

    git clone https://github.com/ola172/web-search-mcp-server.git
    cd search_tool
    
  2. Set up a virtual environment (recommended):

    python3 -m venv .venv
    source .venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Configure environment variables:

    • Copy .env.example to .env
    • Add your secrets:

Usage

Core Tools

Each tool validates input, performs the search, scrapes the results, and analyzes the content.

  • General Web Search: search_on_web
  • Custom Sites Search: search_custom_sites
  • Custom Domains Search: search_custom_domain

MCP Server Integration

The project includes an MCP server (web_search_server.py) for exposing search tools as mcp tools.

Extending the Framework

  • Add a new searcher: Implement the SearchInterface and register it in SearcherFactory.
  • Add a new scraper: Implement the ScraperInterface and register it in ScraperFactory.
  • Add a new analyzer: Implement the AnalyzerInterface and register it in AnalyzerFactory.

Configuration

  • API Keys: Store sensitive keys (e.g., OpenAI) in the .env file.
  • Search Engine IDs: For Google Custom Search, configure API_KEY and SEARCH_ENGINE_ID in the relevant modules.

Dependencies

  • openai
  • trafilatura
  • pydantic
  • googlesearch-python
  • python-dotenv
  • google-api-python-client

See requirements.txt for the full list.

License

This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used.

Acknowledgements

  • OpenAI
  • Trafilatura
  • Google Custom Search

For questions or contributions, please open an issue or pull request.