Post
2374
Let's pipe some ๐ฑ๐ฎ๐๐ฎ ๐ณ๐ฟ๐ผ๐บ ๐๐ต๐ฒ ๐๐ฒ๐ฏ into our vector database, shall we?๐ค
With ๐ข๐ง๐ ๐๐ฌ๐ญ-๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ ๐ฏ๐.๐.๐ (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!๐ธ๏ธ
You can do it thanks to ๐ฐ๐ฟ๐ฎ๐๐น๐ฒ๐ฒ by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with ๐๐ฒ๐ฎ๐๐๐ถ๐ณ๐๐น๐ฆ๐ผ๐๐ฝ, ๐ฃ๐ฑ๐ณ๐๐๐๐ผ๐๐ป and ๐ฃ๐๐ ๐๐ฃ๐ฑ๐ณ to scrape HTML files, convert them to PDF and extract the text - hassle-free!๐ธ
Check the attached code snippet if you're curious of knowing how to get started๐ฌ
PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie๐ฆโจ
If you don't want to miss out on the new features, leave us a little star on GitHub โก๏ธ https://github.com/AstraBert/ingest-anything
And join our discord community! โก๏ธ https://discord.gg/kDqHNjks
With ๐ข๐ง๐ ๐๐ฌ๐ญ-๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ ๐ฏ๐.๐.๐ (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!๐ธ๏ธ
You can do it thanks to ๐ฐ๐ฟ๐ฎ๐๐น๐ฒ๐ฒ by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with ๐๐ฒ๐ฎ๐๐๐ถ๐ณ๐๐น๐ฆ๐ผ๐๐ฝ, ๐ฃ๐ฑ๐ณ๐๐๐๐ผ๐๐ป and ๐ฃ๐๐ ๐๐ฃ๐ฑ๐ณ to scrape HTML files, convert them to PDF and extract the text - hassle-free!๐ธ
Check the attached code snippet if you're curious of knowing how to get started๐ฌ
PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie๐ฆโจ
If you don't want to miss out on the new features, leave us a little star on GitHub โก๏ธ https://github.com/AstraBert/ingest-anything
And join our discord community! โก๏ธ https://discord.gg/kDqHNjks