@Shreyas094 on Hugging Face: "Help me to upgrade my model. Hi all, so I am a complete beginner in coding…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

Shreyas094

posted an update Sep 10, 2024

Post

660

Help me to upgrade my model.

Hi all, so I am a complete beginner in coding, however, with the help of Claude (similar to Matt :P) and GPT 4o have been able to develop this RAG PDF summarizer/Q&A plus a web search tool.

The application is specifically built for summarization task including summarizing a financial document, news article, resume, research document, call transcript, etc.

The space could be found here: Shreyas094/SearchGPT

The news tool simply use duckduckgo chat to generate the search results using llama 3.1 70bn model.

I want your support to fine tune the retrieval task for handling more unstructured documents.

John6666

Sep 10, 2024

I think changing this would change the search results somewhat, but there don't seem to be too many options to choose from.
I can give you some advice if I know how you want to enhance it.

https://huggingface.co/spaces/Shreyas094/SearchGPT/blob/main/app.py

def get_web_search_results(query: str, max_results: int = 10) -> List[Dict[str, str]]:
    try:
        results = list(DDGS().text(query, max_results=max_results))

https://pypi.org/project/duckduckgo-search/#2-text---text-search-by-duckduckgocom

Shreyas094

Sep 10, 2024

Hi John, thanks so much for the contribution. However, I would like to implement some upgrades to my RAG setup for PDF summarization task. Currently I have not worked alot on my Vector DB creation, chunking, indexing and embeddings part. I feel working on these functions shall improve the retrieval process, especially when it comes to 100-200 pager research documents. If possible, can you provide some suggestion on that part. Thanks

nicolollo

Sep 11, 2024

Bro the (similar to Matt ) killed me XD

Shreyas094

Sep 12, 2024

Hahaha atleast someone got it

In this post