Clelia Astra Bertelli's picture

Clelia Astra Bertelli

as-cle-bert

AI & ML interests

Biology + Artificial Intelligence = โค๏ธ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source

Recent Activity

posted an update 1 day ago
Let's pipe some ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜๐—ต๐—ฒ ๐˜„๐—ฒ๐—ฏ into our vector database, shall we?๐Ÿค  With ๐ข๐ง๐ ๐ž๐ฌ๐ญ-๐š๐ง๐ฒ๐ญ๐ก๐ข๐ง๐  ๐ฏ๐Ÿ.๐Ÿ‘.๐ŸŽ (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!๐Ÿ•ธ๏ธ You can do it thanks to ๐—ฐ๐—ฟ๐—ฎ๐˜„๐—น๐—ฒ๐—ฒ by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with ๐—•๐—ฒ๐—ฎ๐˜‚๐˜๐—ถ๐—ณ๐˜‚๐—น๐—ฆ๐—ผ๐˜‚๐—ฝ, ๐—ฃ๐—ฑ๐—ณ๐—œ๐˜๐——๐—ผ๐˜„๐—ป and ๐—ฃ๐˜†๐— ๐˜‚๐—ฃ๐—ฑ๐—ณ to scrape HTML files, convert them to PDF and extract the text - hassle-free!๐Ÿ˜ธ Check the attached code snippet if you're curious of knowing how to get started๐ŸŽฌ PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie๐Ÿฆ›โœจ If you don't want to miss out on the new features, leave us a little star on GitHub โžก๏ธ https://github.com/AstraBert/ingest-anything And join our discord community! โžก๏ธ https://discord.gg/kDqHNjks
posted an update 10 days ago
Hey there, ๐—ถ๐—ป๐—ด๐—ฒ๐˜€๐˜-๐—ฎ๐—ป๐˜†๐˜๐—ต๐—ถ๐—ป๐—ด ๐˜ƒ๐Ÿญ.๐Ÿฌ.๐Ÿฌ just dropped with major changes: โœ… Embeddings: now works with Sentence Transformers, Jina AI, Cohere, OpenAI, and Model2Vec All powered via ๐—–๐—ต๐—ผ๐—ป๐—ธ๐—ถ๐—ฒโ€™๐˜€ ๐—”๐˜‚๐˜๐—ผ๐—˜๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด๐˜€. No more local-only limitations ๐Ÿ™Œ โœ… Vector DBs: now supports ๐—ฎ๐—น๐—น ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—œ๐—ป๐—ฑ๐—ฒ๐˜…-๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฎ๐˜๐—ถ๐—ฏ๐—น๐—ฒ ๐—ฏ๐—ฎ๐—ฐ๐—ธ๐—ฒ๐—ป๐—ฑ๐˜€ Think: Qdrant, Pinecone, Weaviate, Milvus, etc. No more bottlenecks๐Ÿ”“ โœ… File parsing: now plugs into any ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—œ๐—ป๐—ฑ๐—ฒ๐˜…-๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฎ๐˜๐—ถ๐—ฏ๐—น๐—ฒ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—น๐—ผ๐—ฎ๐—ฑ๐—ฒ๐—ฟ Using LlamaParse, Docling or your own setup? Youโ€™re covered. Curious of knowing more? Try it out! ๐Ÿ‘‰ https://github.com/AstraBert/ingest-anything
posted an update 11 days ago
One of the biggest challenges I've been facing since I started developing [๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐Ÿซฃ That's why today I'm excited to introduce ๐ซ๐ž๐š๐๐ž๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐ŸŽ‰ With ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ฆ๐˜ณ๐˜ด, you can choose among three (for now๐Ÿ‘€) flavors of text extraction and conversion to PDF: - ๐——๐—ผ๐—ฐ๐—น๐—ถ๐—ป๐—ด, which does a fantastic work with presentations, spreadsheets and word documents๐Ÿฆ† - ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—ฃ๐—ฎ๐—ฟ๐˜€๐—ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐Ÿฆ™ - ๐— ๐—ฎ๐—ฟ๐—ธ๐—œ๐˜๐——๐—ผ๐˜„๐—ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โœ’๏ธ You can use this new feature in your python scripts (check the attached code snippet!๐Ÿ˜‰) and in the command line interface as well!๐Ÿ Have fun and don't forget to star the repo on GitHub โžก๏ธ https://github.com/AstraBert/PdfItDown
View all activity

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture GreenFit AI's profile picture

as-cle-bert's activity

published an article 3 months ago
published an article 4 months ago
published an article 4 months ago
published an article 5 months ago
published an article 6 months ago
published an article 7 months ago
published an article 10 months ago
published an article 10 months ago
view article
Article

_Repetita iuvant_: how to improve AI code generation

By as-cle-bert โ€ข
โ€ข 5
published an article 11 months ago
published an article 12 months ago