Supercharge your Retrieval Augmented Generation (RAG) pipelines with ease! I just finished working on the **RAG-Ready Content Scraper**, a mix between two very useful tools (RAG-Scraper and RepoMix); now available as a Hugging Face Space!
## What can it do?
This intuitive application helps you effortlessly gather and process content from various sources:
* π **Webpages**: Scrape content from any URL (with RAG-Scraper). You can even control the scraping depth to fetch linked pages!
* π **GitHub Repositories**: Process entire GitHub repos (using the power of Repomix) by simply providing a URL or
username/repo
ID.## Various Output Formats
Convert the scraped content into a variety of RAG-friendly formats:
* **Markdown** (.md)
* **JSON** (.json)
* **CSV** (.csv)
* **Plain Text** (.txt)
* **PDF** (.pdf)
Perfect for building datasets, knowledge bases, and feeding your LLMs with high-quality, structured information.
## Hope you enjoY!
Ready to streamline your RAG data preparation?
π **Visit the RAG-Ready Content Scraper on Hugging Face Spaces:** [https://huggingface.co/spaces/CultriX/RAG-Scraper]
---
Feedback and feature requests are welcome! Let's build better RAG together.