it run but

#4
by rakmik - opened

Imports

import fitz # PyMuPDF
from teapotai import TeapotAI

Load and extract text from PDF

pdf_path = '/content/Understanding_Climate_Change.pdf'
doc = fitz.open(pdf_path)
pdf_text = ""

Extract text from all pages

for page in doc:
pdf_text += page.get_text()

doc.close()

Initialize TeapotAI with PDF content as document

teapot_ai = TeapotAI(documents=[pdf_text])

Ask question about Fahd Mirza

query = "What is the topic of the book?"
answer = teapot_ai.query(query=query, context=pdf_text)

_____ _ _ ___ o ;;
|
|_ __ _ _ __ ___ | |_ / \ |_ | __ /--_/ /
| |/ _ / ` | ' \ / _ | | / _ \ | | ( | |/
| | _/ (| | |) | () | |_ / ___ \ | | _|~~~~~~~|
|
|__|_,_| .
/ _/ _/ // __| ____/
|
|
Loading Model: teapotai/teapotllm Revision: 699ab39cbf586674806354e92fbd6179f9a95f4a
Device set to use cpu
Device set to use cpu
Generating embeddings for documents...
Document Embedding: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:04<00:00, 4.37s/doc]
Token indices sequence length is longer than the specified maximum sequence length for this model (12644 > 512). Running this sequence through the model will result in indexing errors

Teapot AI org

You need to use proper document chunking and rag settings to ensure you don't overfill context.

zakerytclarke changed discussion status to closed

Sign up or log in to comment