Clelia Astra Bertelli
AI & ML interests
Recent Activity
Organizations
as-cle-bert's activity

Well, I might have a tool for you: ๐๐๐๐๐๐จ๐ญ๐๐ฌ (https://github.com/AstraBert/pdf2notes) is an ๐๐-๐ฝ๐ผ๐๐ฒ๐ฟ๐ฒ๐ฑ, ๐ผ๐ฝ๐ฒ๐ป-๐๐ผ๐๐ฟ๐ฐ๐ฒ solution that lets you turn your unstructured and chaotic PDFs into nice and well-ordered notes in a matter of seconds!๐
๐๐ผ๐ ๐ฑ๐ผ๐ฒ๐ ๐ถ๐ ๐๐ผ๐ฟ๐ธ?
๐ You first upload a document
โ๏ธ LlamaParse by LlamaIndex extracts the text from the document, using DeepMind's Gemini 2 Flash to perform multi-modal parsing
๐ง Llama-3.3-70B by Groq turns the extracted text into notes!
The notes are not perfect or you want more in-depth insights? No problem:
๐ฌ Send a direct message to the chatbot
โ๏ธ The chatbot will retrieve the chat history from a Postgres database
๐ง Llama-3.3-70B will produce the answer you need
All of this is nicely wrapped within a seamless backend-to-frontend framework powered by Gradio and FastAPI๐จ
And you can even spin it up easily and locally, using Docker๐
So, what are you waiting for? Go turn your hundreds of pages of chaotic learning material into neat and elegant notes โก๏ธ https://github.com/AstraBert/pdf2notes
And, if you would like an online demo, feel free to drop a comment - we'll see what we can build๐

GitHub ๐ https://github.com/AstraBert/ragcoon
Are you building a startup and you're stuck in the process, trying to navigate hundreds of resources, suggestions and LinkedIn posts?๐ถโ๐ซ๏ธ
Well, fear no more, because ๐ฅ๐๐๐ฐ๐ผ๐ผ๐ป๐ฆ is here to do some of the job for you:
๐ It's built on free resources written by successful founders
โ๏ธ It performs complex retrieval operations, exploiting "vanilla" hybrid search, query expansion with an ๐ต๐๐ฝ๐ผ๐๐ต๐ฒ๐๐ถ๐ฐ๐ฎ๐น ๐ฑ๐ผ๐ฐ๐๐บ๐ฒ๐ป๐ approach and ๐บ๐๐น๐๐ถ-๐๐๐ฒ๐ฝ ๐พ๐๐ฒ๐ฟ๐ ๐ฑ๐ฒ๐ฐ๐ผ๐บ๐ฝ๐ผ๐๐ถ๐๐ถ๐ผ๐ป
๐ It evaluates the ๐ฟ๐ฒ๐น๐ถ๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ of the retrieved context, and the ๐ฟ๐ฒ๐น๐ฒ๐๐ฎ๐ป๐ฐ๐ and ๐ณ๐ฎ๐ถ๐๐ต๐ณ๐๐น๐ป๐ฒ๐๐ of its own responses, in an auto-correction effort
RAGcoon๐ฆ is ๐ผ๐ฝ๐ฒ๐ป-๐๐ผ๐๐ฟ๐ฐ๐ฒ and relies on easy-to-use components:
๐นLlamaIndex is at the core of the agent architecture, provisions the integrations with language models and vector database services, and performs evaluations
๐น Qdrant is your go-to, versatile and scalable companion for vector database services
๐นGroq provides lightning-fast LLM inference to support the agent, giving it the full power of ๐ค๐๐ค-๐ฏ๐ฎ๐ by Qwen
๐นHugging Face provides the embedding models used for dense and sparse retrieval
๐นFastAPI wraps the whole backend into an API interface
๐น๐ ๐ฒ๐๐ผ๐ฝ by Google is used to serve the application frontend
RAGcoon๐ฆ can be spinned up locally - it's ๐๐ผ๐ฐ๐ธ๐ฒ๐ฟ-๐ฟ๐ฒ๐ฎ๐ฑ๐๐, and you can find the whole code to reproduce it on GitHub ๐ https://github.com/AstraBert/ragcoon
But there might be room for an online version of RAGcoon๐ฆ: let me know if you would use it - we can connect and build it together!๐
Update requirements.txt


GitHub ๐ https://github.com/AstraBert/diRAGnosis
PyPi ๐ https://pypi.org/project/diragnosis/
It's called ๐๐ข๐๐๐๐ง๐จ๐ฌ๐ข๐ฌ and is a lightweight framework that helps you ๐ฑ๐ถ๐ฎ๐ด๐ป๐ผ๐๐ฒ ๐๐ต๐ฒ ๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ฎ๐ป๐ฐ๐ฒ ๐ผ๐ณ ๐๐๐ ๐ ๐ฎ๐ป๐ฑ ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ฅ๐๐ ๐ฎ๐ฝ๐ฝ๐น๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป๐.
You can launch it as an application locally (it's Docker-ready!๐) or, if you want more flexibility, you can integrate it in your code as a python package๐ฆ
The workflow is simple:
๐ง You choose your favorite LLM provider and model (supported, for now, are Mistral AI, Groq, Anthropic, OpenAI and Cohere)
๐ง You pick the embedding models provider and the embedding model you prefer (supported, for now, are Mistral AI, Hugging Face, Cohere and OpenAI)
๐ You prepare and provide your documents
โ๏ธ Documents are ingested into a Qdrant vector database and transformed into a synthetic question dataset with the help of LlamaIndex
๐ The LLM is evaluated for the faithfulness and relevancy of its retrieval-augmented answer to the questions
๐ The embedding model is evaluated for hit rate and mean reciprocal ranking (MRR) of the retrieved documents
And the cool thing is that all of this is ๐ถ๐ป๐๐๐ถ๐๐ถ๐๐ฒ ๐ฎ๐ป๐ฑ ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐๐ฒ๐น๐ ๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ฒ๐ฑ: you plug it in, and it works!๐โก
Even cooler? This is all built on top of LlamaIndex and its integrations: no need for tons of dependencies or fancy workarounds๐ฆ
And if you're a UI lover, Gradio and FastAPI are there to provide you a seamless backend-to-frontend experience๐ถ๏ธ
So now it's your turn: you can either get diRAGnosis from GitHub ๐ https://github.com/AstraBert/diRAGnosis
or just run a quick and painless:
uv pip install diragnosis
To get the package installed (lightning-fast) in your environment๐โโ๏ธ
Have fun and feel free to leave feedback and feature/integrations requests on GitHub issuesโจ

Hi there, just wanted to reach out also here, so that if people see our conversation know that this feature has been integrated: you can now find it in the v0.1.0 of the package, already installable via pip.
Have fun!

I did not specify any configuration, but I'm pretty sure we could play around with Supabase and set a login/logout status for the user (like saying: the user last logged in at time X and logged out at time Y; if Y > X, then the user can login in again, else they cannot).
If you want, I can put it in the roadmap for the next release of the package: then I would ask you to open an issue here: https://github.com/AstraBert/streamlit_supabase_auth_ui/issues so that I can add it to the milestone for v0.1.0 :)

I could not find it either back in the days, when I wanted to suppress it, but my suspicion is that is linked to some not-so-up-to-date portions of the code (the code is based on a repo that used Streamlit 1.34, I believe). Nevertheless, what I did in my personal projects was suppressing all the warnings with:
from warnings import filterwarnings
filterwarnings(action="ignore")
# source -> https://www.geeksforgeeks.org/how-to-disable-python-warnings/
Hope this helps!

Hi! Yes, the code is open and you can modify it for your projects :)
If you want to change the language of the components, you just need to modify the widget.py script, i.e. https://github.com/AstraBert/streamlit_supabase_auth_ui/blob/main/streamlit_supabase_auth_ui/widgets.py

And, believe me, this is ๐ป๐ผ๐ clickbaitโ
GitHub ๐ https://github.com/AstraBert/PapersChat
Demo ๐ as-cle-bert/PapersChat
The app is called ๐๐๐ฉ๐๐ซ๐ฌ๐๐ก๐๐ญ, and it is aimed at ๐บ๐ฎ๐ธ๐ถ๐ป๐ด ๐ฐ๐ต๐ฎ๐๐๐ถ๐ป๐ด ๐๐ถ๐๐ต ๐๐ฐ๐ถ๐ฒ๐ป๐๐ถ๐ณ๐ถ๐ฐ ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ๐ ๐ฒ๐ฎ๐๐ถ๐ฒ๐ฟ.
๐๐๐ซ๐ ๐ข๐ฌ ๐ฐ๐ก๐๐ญ ๐ญ๐ก๐ ๐๐ฉ๐ฉ ๐๐จ๐๐ฌ:
๐ Parses the papers that you upload thanks to LlamaIndex๐ฆ (either with LlamaParse or with simpler, local methods)
๐ Embeds documents both with a sparse and with a dense encoder to enable hybrid search
๐ Uploads the embeddings to Qdrant
โ๏ธ Activates an Agent based on mistralai/Mistral-Small-24B-Instruct-2501 that will reply to your prompt
๐ง Retrieves information relevant to your question from the documents
๐ง If no relevant information is found, it searches PubMed and arXiv databases
๐ง Returns a grounded answer to your prompt
๐๐จ๐ฐ ๐๐ข๐ ๐ ๐ฆ๐๐ง๐๐ ๐ ๐ญ๐จ ๐ฆ๐๐ค๐ ๐ญ๐ก๐ข๐ฌ ๐๐ฉ๐ฉ๐ฅ๐ข๐๐๐ญ๐ข๐จ๐ง ๐ข๐ง ๐ ๐ก๐จ๐ฎ๐ซ๐ฌ?
Three key points:
- LlamaIndex๐ฆ provides countless integrations with LLM providers, text embedding models and vectorstore services, and takes care of the internal architecture of the Agent. You just plug it in, and it works!๐โก
- Qdrant is a vector database service extremely easy to set up and use: you just need a one-line Docker command๐
- Gradio makes frontend development painless and fast, while still providing modern and responsive interfaces๐๏ธ
And a bonus point:
- Deploying the demo app couldn't be easier if you use Gradio-based Hugging Face Spaces๐ค
So, no more excuses: build your own AI agent today and do it fast, (almost) for free and effortlessly๐
And if you need a starting point, the code for PapersChat is open and fully reproducible on GitHub ๐ https://github.com/AstraBert/PapersChat

GitHub ๐ https://github.com/AstraBert/SciNewsBot
BlueSky ๐ https://bsky.app/profile/sci-news-bot.bsky.social
Hi there HF Community!๐ค
I just created a very simple AI-powered bot that shares fact-checked news about Science, Environment, Energy and Technology on BlueSky :)
The bot takes news from Google News, filters out the sources that are not represented in the Media Bias Fact Check database, and then evaluates the reliability of the source based on the MBFC metrics. After that, it creates a catchy headline for the article and publishes the post on BlueSky๐ฐ
The cool thing? SciNewsBot is open-source and is cheap to maintain, as it is based on mistralai/Mistral-Small-24B-Instruct-2501 (via Mistral API). You can reproduce it locally, spinning it up on your machine, and even launch it on cloud through a comfy Docker setup๐
Have fun and spread Science!โจ

Demo ๐ https://pqstem.org
GitHub ๐ https://github.com/AstraBert/PhiQwenSTEM
Hello HF community!๐ค
Ever struggled with some complex Maths problem or with a very hard Physics question? Well, fear no more, because now you can rely on PhiQwenSTEM, an assistant specialized in answering STEM-related question!
The assistant can count on a knowledge base of ๐ญ๐ฑ๐ธ+ ๐๐ฒ๐น๐ฒ๐ฐ๐๐ฒ๐ฑ ๐ฆ๐ง๐๐ ๐พ๐๐ฒ๐๐๐ถ๐ผ๐ป-๐ฎ๐ป๐๐๐ฒ๐ฟ ๐ฝ๐ฎ๐ถ๐ฟ๐ spanning the domains of Chemistry, Physics, Matemathics and Biochemistry (from EricLu/SCP-116K). It also relies on the combined power of microsoft/Phi-3.5-mini-instruct and Qwen/QwQ-32B-Preview to produce reliable and reasoned answers.
For the next 30 days, you will be able to try for free the web demo: https://pqstem.org
In the GitHub repo you can find all the information to reproduce PhiQwenSTEM ๐ผ๐ป ๐๐ผ๐๐ฟ ๐น๐ผ๐ฐ๐ฎ๐น ๐บ๐ฎ๐ฐ๐ต๐ถ๐ป๐ฒ, ๐ฏ๐ผ๐๐ต ๐๐ถ๐ฎ ๐๐ผ๐๐ฟ๐ฐ๐ฒ ๐ฐ๐ผ๐ฑ๐ฒ ๐ฎ๐ป๐ฑ ๐๐ถ๐๐ต ๐ฎ ๐ฐ๐ผ๐บ๐ณ๐ ๐๐ผ๐ฐ๐ธ๐ฒ๐ฟ๐ ๐๐ฒ๐๐๐ฝ: https://github.com/AstraBert/PhiQwenSTEM