Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
as-cle-bertย 
posted an update 1 day ago
Post
2581
Let's pipe some ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐˜„๐—ฒ๐—ฏ into our vector database, shall we?๐Ÿค 

With ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  ๐ฏ๐Ÿ.๐Ÿ‘.๐ŸŽ (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!๐Ÿ•ธ๏ธ

You can do it thanks to ๐—ฐ๐—ฟ๐—ฎ๐˜„๐—น๐—ฒ๐—ฒ by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with ๐—•๐—ฒ๐—ฎ๐˜‚๐˜๐—ถ๐—ณ๐˜‚๐—น๐—ฆ๐—ผ๐˜‚๐—ฝ, ๐—ฃ๐—ฑ๐—ณ๐—œ๐˜๐——๐—ผ๐˜„๐—ป and ๐—ฃ๐˜†๐— ๐˜‚๐—ฃ๐—ฑ๐—ณ to scrape HTML files, convert them to PDF and extract the text - hassle-free!๐Ÿ˜ธ

Check the attached code snippet if you're curious of knowing how to get started๐ŸŽฌ

PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie๐Ÿฆ›โœจ

If you don't want to miss out on the new features, leave us a little star on GitHub โžก๏ธ https://github.com/AstraBert/ingest-anything
And join our discord community! โžก๏ธ https://discord.gg/kDqHNjks

Whoa. Reliable open-sourced crawling software is a big win. I'll take it for a spin but I'm optimistic as this is the kind of thing I (and every other AI builder) has been building for years to avoid paying FireCrawl.