Update README.md

4cd634e verified 5 months ago

4.74 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: object-detection
	tags:
	- irs
	- '1040'
	- '2023'
	- tax
	- form
	---


	# Finetuned RT-DETR model to extract tables from IRS 1040 2023 forms

	For IRS from 1040 document data parsing, I have previously uploaded a trained Donut model that is based on vision transformers. The donut model can perform single-shot parsing of 1040 forms and return parsed form values in json format. Vision transformers are cutting edge AI models, they still have some limitations when performing OCR related tasks, where they sometimes hallucinate. Secondly, they do not provide confidence level for extracted fields data, this makes it extremely challenging in making downstream decisions on when to accept a particular field value or drop the parsed value.

	Especially when dealing with financial data, like Form 1040, accuracy and confidence values are of utmost importance.

	This article provides a working example of using multiple AI models to perform OCR of the form 1040 and extract text values in json format with confidence levels for each field.



	```bash
	-----------------------
	\| Classification Model \| (Model is used to classify IRS Form 1040 by page)
	-----------------------
	\|
	\|
	\|
	-----------------------
	\| RT-DETR \|
	\| Object Detection Model\| (Model trained to extract header and tables from Form 1040)
	-----------------------
	\|
	\|
	\|
	-----------------------
	\| Deepseek Vision \| (Utlize Deepseek vision or Qwen vision model to extract data as json)
	\| or Qwen Vision \|
	-----------------------
	```

	## Classes for form 1040
	The RT-DETR model is finetuned with 6 classes related to 1040 2023 form.

	### Page 1 classes
	1040_pg1_header - represents the header of the page 1

	1040_pg1_tax_tbl - represents a table with financial values

	1040_pg1_sch_b - represents a table with schedule b financial values

	### Page 2 classes
	1040_pg2_tax_tbl

	1040_pg2_pay_tbl

	1040_pg2_signature_frm

	# Fake Synthetic Data for IRS 1040 2023 Form Page 1
	![image](https://github.com/user-attachments/assets/b56cca04-1db9-497d-bb34-46b423207984)
	## Cropped - Class: 1040_pg1_header
	![bboxes_pg1_header](https://github.com/user-attachments/assets/c6bf7b76-fc8c-4572-ab31-790d1391adf3)
	## Cropped - Class: 1040_pg1_tax_tbl
	![bboxes_pg1_tax_table](https://github.com/user-attachments/assets/037b7bf8-0add-410e-b85e-3bbe6fa2f29a)
	## Cropped - Class: 1040_pg1_sch_b
	![bboxes_pg1_tbl2](https://github.com/user-attachments/assets/47e49711-1b90-46a1-8f0a-4770c01e6d2c)

	# Fake Synthetic Data for IRS 1040 2023 Form Page 2
	![redlined_pg2](https://github.com/user-attachments/assets/320d422b-4c8f-4134-9d3a-c3d94c72df51)
	## Cropped - Class: 1040_pg2_tax_tbl
	![bboxes_pg2_tax](https://github.com/user-attachments/assets/07a2200d-5546-4539-82a3-35c1bc6b7658)
	## Cropped - Class: 1040_pg2_pay_tbl
	![bboxes_pg2_small_tbl](https://github.com/user-attachments/assets/21b914b7-3666-4478-ae67-1b78fac55de3)
	## Cropped - Class: 1040_pg2_signature_frm
	![bboxes_pg2_signature](https://github.com/user-attachments/assets/ae5df46a-c878-406b-9472-208d49be49c4)


	```python
	from ultralytics import RTDETR
	import cv2
	import supervision as sv

	# --------------------------
	model_file = 'replace with path to model file /1040_2023_v1.pt'

	# Load a trained model from local path
	model = RTDETR(model_file)

	# Display model information (optional)
	model.info()

	image_path = 'path to source image'

	# read src image
	img = cv2.imread(image_path)

	# perform inference
	results = model.predict(img, imgsz=1024) #imgsz is set to 1024 as the model is finetuned with this size

	# use the supervision library for parsing results and generating redline boxes
	detections = sv.Detections.from_ultralytics(results[0])

	#get a bounding box and label the annotator
	bounding_box_annotator = sv.BoundingBoxAnnotator()
	label_annotaotr = sv.LabelAnnotator()

	# generate labels for images
	labels = [
	f"{class_name} {confidence:.2f}"
	for class_name, confidence
	in zip(detections['class_name'], detections.confidence)
	]

	# annotate images with labeled bounding boxes
	annotated_image = bounding_box_annotator.annotate(
	scene=img.copy(),
	detections=detections
	)
	annotated_image = label_annotaotr.annotate(annotated_image, detections=detections, labels=labels)
	# dummy counter for generated image names
	count = 0
	# write the annotated image
	cv2.imwrite('redlined_' + str(count) + '.png', annotated_image)

	# crop bounding boxes and save
	for xyxy in detections.xyxy:
	cropped_image = sv.crop_image(image=img, xyxy=xyxy)
	count = count + 1
	cv2.imwrite('bboxes_' + str(count) + '.png', cropped_image)


	```