--- title: web search MCP-server sdk: gradio colorFrom: green colorTo: green short_description: MCP server for general and custom search on web sdk_version: 5.34.0 tags: - mcp-server-track app_file: app.py pinned: true --- # Search Tool ## Overview **Search Tool** is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers. ## Demo video Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing ![Demo](./demo.webm) ## Features - **Custom Site Search:** Search within a specified list of websites. - **Custom Domain Search:** Restrict searches to specific domains (e.g., `.edu`, `.gov`). - **General Web Search:** Perform open web searches. - **Content Scraping:** Extracts main textual content from URLs using [trafilatura](https://trafilatura.readthedocs.io/). - **AI Analysis:** Summarizes and analyzes scraped content using OpenAI models. - **Validation:** Ensures URLs are valid before processing. - **Extensible Architecture:** Easily add new searchers, scrapers, or analyzers. ## Project Structure ``` search_tool/ ├── src/ │ ├── analyzer/ # AI-powered analyzers (e.g., OpenAI) │ ├── core/ │ │ ├── factory/ # Factories for searcher, scraper, │ │ ├── interface/ # Abstract interfaces for extensibility │ │ └── types.py # Enums and constants │ ├── mcp_servers/ # MCP server integration │ ├── models/ # Pydantic models for data validation │ ├── scraper/ # Web scrapers (e.g., Trafilatura) │ ├── searcher/ # Search engine integrations │ ├── tools/ # User-facing tool functions │ └── utils/ # Utility functions (e.g., URL validation) ├── test.py # Example/test script ├── requirements.txt # Python dependencies ├── pyproject.toml # Project metadata and dependencies ├── .env # Environment variables (e.g., API keys) └── README.md # Project documentation ``` ## Installation 1. **Clone the repository:** ```sh git clone https://github.com/ola172/web-search-mcp-server.git cd search_tool ``` 2. **Set up a virtual environment (recommended):** ```sh python3 -m venv .venv source .venv/bin/activate ``` 3. **Install dependencies:** ```sh pip install -r requirements.txt ``` 4. **Configure environment variables:** - Copy `.env.example` to `.env` - Add your secrets: ## Usage ### Core Tools Each tool validates input, performs the search, scrapes the results, and analyzes the content. - **General Web Search:** `search_on_web` - **Custom Sites Search:** `search_custom_sites` - **Custom Domains Search:** `search_custom_domain` ### MCP Server Integration The project includes an MCP server (`web_search_server.py`) for exposing search tools as mcp tools. ## Extending the Framework - **Add a new searcher:** Implement the `SearchInterface` and register it in `SearcherFactory`. - **Add a new scraper:** Implement the `ScraperInterface` and register it in `ScraperFactory`. - **Add a new analyzer:** Implement the `AnalyzerInterface` and register it in `AnalyzerFactory`. ## Configuration - **API Keys:** Store sensitive keys (e.g., OpenAI) in the `.env` file. - **Search Engine IDs:** For Google Custom Search, configure `API_KEY` and `SEARCH_ENGINE_ID` in the relevant modules. ## Dependencies - `openai` - `trafilatura` - `pydantic` - `googlesearch-python` - `python-dotenv` - `google-api-python-client` See `requirements.txt` for the full list. ## License This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used. ## Acknowledgements - OpenAI - Trafilatura - Google Custom Search For questions or contributions, please open an issue or pull request.