Tell me more

#1
by Woziii - opened

Hi πŸ‘‹ !

I'm really interested about your work!

Your RAG system is very ingenious. I've been looking for examples of RAG systems on space for a long time, but you're one of the only ones to use zeroGPU and FAISS.

If you don't mind, I have a few questions.

I've explored your work and in particular your wikipedia database. How did you create and store the embeddings?

What does your RAG dataset look like before converting it to .parquet format?

Thanks for your work, it's really useful!

Owner
β€’
edited Jul 4

Hi @Woziii
Thanks for your feedback ❀️
As for the full step-by-step guide on how I made this work, you will find it in my blogpost here : https://huggingface.co/blog/not-lain/rag-chatbot-using-llama3 .

image.png

As for the dataset, if you load your data using huggingface datasets (no matter the dataset) , and then apply data.push_to_hub your dataset will be pushed to huggingface and converted to parquet automatically.
you can read more about huggingface datasets at https://huggingface.co/docs/datasets/loading.

Hope this helps, if you still have any more questions feel free to reach out at any time πŸ€—

Your work is incredible! I'm going to use ☺️ for inspiration. However, I'm going to modify the structure slightly, especially the "prompt format" so that the embedded dataset injects itself into the prompt system. I'll be reading your blog with great interest πŸ˜‰

Owner

Thank you so much, that's really sweet of you, let me know how it goes on your side ❀️

If you have discord account, let me know πŸ€—

Owner

yeh sure, my discord username is not_lain you can dm me at any time, or you can join the huggingface discord server and talk with the rest of the community
(i'm pretty active at the HF discord server γƒΎ(≧ β–½ ≦)ゝ)

Sign up or log in to comment