8.5
TFLOPS
Clelia Astra Bertelli
as-cle-bert
1427
followers
ยท
40 following
AI & ML interests
Biology + Artificial Intelligence = โค๏ธ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source
Recent Activity
posted
an
update
10 days ago
Hey there, ๐ถ๐ป๐ด๐ฒ๐๐-๐ฎ๐ป๐๐๐ต๐ถ๐ป๐ด ๐๐ญ.๐ฌ.๐ฌ just dropped with major changes:
โ
Embeddings: now works with Sentence Transformers, Jina AI, Cohere, OpenAI, and Model2Vec
All powered via ๐๐ต๐ผ๐ป๐ธ๐ถ๐ฒโ๐ ๐๐๐๐ผ๐๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด๐.
No more local-only limitations ๐
โ
Vector DBs: now supports ๐ฎ๐น๐น ๐๐น๐ฎ๐บ๐ฎ๐๐ป๐ฑ๐ฒ๐
-๐ฐ๐ผ๐บ๐ฝ๐ฎ๐๐ถ๐ฏ๐น๐ฒ ๐ฏ๐ฎ๐ฐ๐ธ๐ฒ๐ป๐ฑ๐
Think: Qdrant, Pinecone, Weaviate, Milvus, etc.
No more bottlenecks๐
โ
File parsing: now plugs into any ๐๐น๐ฎ๐บ๐ฎ๐๐ป๐ฑ๐ฒ๐
-๐ฐ๐ผ๐บ๐ฝ๐ฎ๐๐ถ๐ฏ๐น๐ฒ ๐ฑ๐ฎ๐๐ฎ ๐น๐ผ๐ฎ๐ฑ๐ฒ๐ฟ
Using LlamaParse, Docling or your own setup? Youโre covered.
Curious of knowing more? Try it out! ๐ https://github.com/AstraBert/ingest-anything
posted
an
update
11 days ago
One of the biggest challenges I've been facing since I started developing [๐๐๐๐๐ญ๐๐จ๐ฐ๐ง](https://github.com/AstraBert/PdfItDown) was handling correctly the conversion of files like Excel sheets and CSVs: table conversion was bad and messy, almost unusable for downstream tasks๐ซฃ
That's why today I'm excited to introduce ๐ซ๐๐๐๐๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐
With ๐ณ๐ฆ๐ข๐ฅ๐ฆ๐ณ๐ด, you can choose among three (for now๐) flavors of text extraction and conversion to PDF:
- ๐๐ผ๐ฐ๐น๐ถ๐ป๐ด, which does a fantastic work with presentations, spreadsheets and word documents๐ฆ
- ๐๐น๐ฎ๐บ๐ฎ๐ฃ๐ฎ๐ฟ๐๐ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐ฆ
- ๐ ๐ฎ๐ฟ๐ธ๐๐๐๐ผ๐๐ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โ๏ธ
You can use this new feature in your python scripts (check the attached code snippet!๐) and in the command line interface as well!๐
Have fun and don't forget to star the repo on GitHub โก๏ธ https://github.com/AstraBert/PdfItDown
View all activity
Organizations
as-cle-bert 's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles