pypdf2 arxiv numpy pandas fitz transformers