Imports

import fitz # PyMuPDF
from teapotai import TeapotAI

Load and extract text from PDF

pdf_path = '/content/Understanding_Climate_Change.pdf'
doc = fitz.open(pdf_path)
pdf_text = ""

Extract text from all pages

for page in doc:
pdf_text += page.get_text()

doc.close()

Initialize TeapotAI with PDF content as document

teapot_ai = TeapotAI(documents=[pdf_text])

Ask question about Fahd Mirza

query = "What is the topic of the book?"
answer = teapot_ai.query(query=query, context=pdf_text)

_____ _ _ ___ o ;;
| |_ __ _ _ __ ___ | |_ / \ |_ | __ /--_/ /
| |/ _ / ` | ' \ / _ | | / _ \ | | ( | |/
| | _/ (| | |) | () | |_ / ___ \ | | _|~~~~~~~|
||__|_,_| ./ _/ _/ // __| ____/
||
Loading Model: teapotai/teapotllm Revision: 699ab39cbf586674806354e92fbd6179f9a95f4a
Device set to use cpu
Device set to use cpu
Generating embeddings for documents...
Document Embedding: 100%|██████████| 1/1 [00:04<00:00, 4.37s/doc]
Token indices sequence length is longer than the specified maximum sequence length for this model (12644 > 512). Running this sequence through the model will result in indexing errors

rakmik

Apr 1

https://github.com/kim90000/teapotllm/blob/main/Xteapotllm.ipynb

zakerytclarke

Teapot AI org Apr 3

You need to use proper document chunking and rag settings to ensure you don't overfill context.

zakerytclarke changed discussion status to closed Apr 3

teapotai
/

teapotllm

it run but

Imports

Load and extract text from PDF

Extract text from all pages

Initialize TeapotAI with PDF content as document

Ask question about Fahd Mirza