# Web Scraper Project Instructions This is a Python Gradio application for web scraping that: - Scrapes text content from websites - Formats content as markdown - Generates sitemaps from page links - Provides MCP (Model Context Protocol) server functionality ## Key Libraries - gradio[mcp]: For the web interface and MCP server capabilities - requests: For HTTP requests - beautifulsoup4: For HTML parsing - markdownify: For converting HTML to markdown - urllib.parse: For URL handling ## Project Structure - `app.py`: Main web interface application - `mcp_server.py`: MCP server that exposes tools for AI integration ## MCP Tools The MCP server exposes three main tools: - `scrape_content`: Extract website content as markdown - `generate_sitemap`: Create sitemap from page links - `analyze_website`: Complete analysis with content and sitemap ## Code Style - Use type hints where appropriate - Include proper error handling for web requests - Follow PEP 8 style guidelines - Add docstrings for functions with clear parameter descriptions - MCP functions should have descriptive docstrings as they become tool descriptions