A newer version of the Gradio SDK is available:
5.38.0
title: web search MCP-server
sdk: gradio
colorFrom: green
colorTo: green
short_description: MCP server for general and custom search on web
sdk_version: 5.34.0
tags:
- mcp-server-track
app_file: app.py
pinned: true
Search Tool
Overview
Search Tool is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers.
Demo video
Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing
Features
- Custom Site Search: Search within a specified list of websites.
- Custom Domain Search: Restrict searches to specific domains (e.g.,
.edu
,.gov
). - General Web Search: Perform open web searches.
- Content Scraping: Extracts main textual content from URLs using trafilatura.
- AI Analysis: Summarizes and analyzes scraped content using OpenAI models.
- Validation: Ensures URLs are valid before processing.
- Extensible Architecture: Easily add new searchers, scrapers, or analyzers.
Project Structure
search_tool/
βββ src/
β βββ analyzer/ # AI-powered analyzers (e.g., OpenAI)
β βββ core/
β β βββ factory/ # Factories for searcher, scraper,
β β βββ interface/ # Abstract interfaces for extensibility
β β βββ types.py # Enums and constants
β βββ mcp_servers/ # MCP server integration
β βββ models/ # Pydantic models for data validation
β βββ scraper/ # Web scrapers (e.g., Trafilatura)
β βββ searcher/ # Search engine integrations
β βββ tools/ # User-facing tool functions
β βββ utils/ # Utility functions (e.g., URL validation)
βββ test.py # Example/test script
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Project metadata and dependencies
βββ .env # Environment variables (e.g., API keys)
βββ README.md # Project documentation
Installation
Clone the repository:
git clone https://github.com/ola172/web-search-mcp-server.git cd search_tool
Set up a virtual environment (recommended):
python3 -m venv .venv source .venv/bin/activate
Install dependencies:
pip install -r requirements.txt
Configure environment variables:
- Copy
.env.example
to.env
- Add your secrets:
- Copy
Usage
Core Tools
Each tool validates input, performs the search, scrapes the results, and analyzes the content.
- General Web Search:
search_on_web
- Custom Sites Search:
search_custom_sites
- Custom Domains Search:
search_custom_domain
MCP Server Integration
The project includes an MCP server (web_search_server.py
) for exposing search tools as mcp tools.
Extending the Framework
- Add a new searcher: Implement the
SearchInterface
and register it inSearcherFactory
. - Add a new scraper: Implement the
ScraperInterface
and register it inScraperFactory
. - Add a new analyzer: Implement the
AnalyzerInterface
and register it inAnalyzerFactory
.
Configuration
- API Keys: Store sensitive keys (e.g., OpenAI) in the
.env
file. - Search Engine IDs: For Google Custom Search, configure
API_KEY
andSEARCH_ENGINE_ID
in the relevant modules.
Dependencies
openai
trafilatura
pydantic
googlesearch-python
python-dotenv
google-api-python-client
See requirements.txt
for the full list.
License
This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used.
Acknowledgements
- OpenAI
- Trafilatura
- Google Custom Search
For questions or contributions, please open an issue or pull request.