Update README.md
Browse files
README.md
CHANGED
@@ -24,17 +24,16 @@ This model is a fine-tuned version of `microsoft/layoutlmv3-base` designed for t
|
|
24 |
### Model Description
|
25 |
|
26 |
* **Developed by:** Parthesh Ingale
|
27 |
-
* **
|
28 |
-
* **Shared by \[optional]:** [parthesh111](https://huggingface.co/parthesh111)
|
29 |
* **Model type:** Token Classification (NER)
|
30 |
* **Language(s) (NLP):** English
|
31 |
* **License:** Apache-2.0
|
32 |
-
* **Finetuned from model
|
33 |
|
34 |
-
### Model Sources
|
35 |
|
36 |
* **Repository:** [https://huggingface.co/parthesh111/layoutlmv3-finetune-bioes-new](https://huggingface.co/parthesh111/layoutlmv3-finetune-bioes-new)
|
37 |
-
* **Paper
|
38 |
|
39 |
## Uses
|
40 |
|
@@ -43,7 +42,7 @@ This model is a fine-tuned version of `microsoft/layoutlmv3-base` designed for t
|
|
43 |
* Extract named entities from medical lab reports (scanned images).
|
44 |
* Automate structured data extraction from semi-structured medical documents.
|
45 |
|
46 |
-
### Downstream Use
|
47 |
|
48 |
* Preprocessing step in EHR (Electronic Health Records).
|
49 |
* PII-aware document processing.
|
@@ -81,7 +80,7 @@ import numpy as np
|
|
81 |
import os
|
82 |
from huggingface_hub import login
|
83 |
|
84 |
-
# Login to Hugging Face using environment variable token
|
85 |
HF_TOKEN = os.environ.get("HF_TOKEN")
|
86 |
if not HF_TOKEN:
|
87 |
st.error("Hugging Face token not found. Please set 'HF_TOKEN' as an environment variable.")
|
@@ -320,7 +319,7 @@ st.markdown("""
|
|
320 |
|
321 |
### Training Procedure
|
322 |
|
323 |
-
#### Preprocessing
|
324 |
|
325 |
* Images were preprocessed using PaddleOCR.
|
326 |
* Bounding boxes normalized to 1000-scale.
|
@@ -330,10 +329,10 @@ st.markdown("""
|
|
330 |
|
331 |
* **Training regime:** fp16 mixed precision
|
332 |
* **Epochs:** 20
|
333 |
-
* **Batch size:**
|
334 |
* **Learning rate:** 5e-5
|
335 |
|
336 |
-
#### Speeds, Sizes, Times
|
337 |
|
338 |
* **Checkpoint size:** \~435 MB
|
339 |
* **Training time:** \~2 hours on RTX 3060
|
@@ -367,7 +366,7 @@ LayoutLMv3 with token classification head using OCR input (image, text, and layo
|
|
367 |
|
368 |
* PyTorch, Hugging Face Transformers, PaddleOCR, Streamlit
|
369 |
|
370 |
-
## Citation
|
371 |
|
372 |
**BibTeX:**
|
373 |
|
@@ -379,18 +378,10 @@ LayoutLMv3 with token classification head using OCR input (image, text, and layo
|
|
379 |
howpublished = {\url{https://huggingface.co/parthesh111/layoutlmv3-finetune-bioes-new}},
|
380 |
}
|
381 |
```
|
382 |
-
## Glossary
|
383 |
|
384 |
* **BIOES:** Beginning, Inside, Outside, End, Single tagging scheme used for NER.
|
385 |
|
386 |
-
## More Information \[optional]
|
387 |
-
|
388 |
-
For demo, Streamlit app, or usage questions, contact below.
|
389 |
-
|
390 |
-
## Model Card Authors \[optional]
|
391 |
-
|
392 |
-
* Parthesh Ingale
|
393 |
-
|
394 |
## Model Card Contact
|
395 |
|
396 |
* **GitHub/HF:** [parthesh111](https://huggingface.co/parthesh111)
|
|
|
24 |
### Model Description
|
25 |
|
26 |
* **Developed by:** Parthesh Ingale
|
27 |
+
* **Shared by:** [parthesh111](https://huggingface.co/parthesh111)
|
|
|
28 |
* **Model type:** Token Classification (NER)
|
29 |
* **Language(s) (NLP):** English
|
30 |
* **License:** Apache-2.0
|
31 |
+
* **Finetuned from model:** `microsoft/layoutlmv3-base`
|
32 |
|
33 |
+
### Model Sources
|
34 |
|
35 |
* **Repository:** [https://huggingface.co/parthesh111/layoutlmv3-finetune-bioes-new](https://huggingface.co/parthesh111/layoutlmv3-finetune-bioes-new)
|
36 |
+
* **Paper:** N/A
|
37 |
|
38 |
## Uses
|
39 |
|
|
|
42 |
* Extract named entities from medical lab reports (scanned images).
|
43 |
* Automate structured data extraction from semi-structured medical documents.
|
44 |
|
45 |
+
### Downstream Use
|
46 |
|
47 |
* Preprocessing step in EHR (Electronic Health Records).
|
48 |
* PII-aware document processing.
|
|
|
80 |
import os
|
81 |
from huggingface_hub import login
|
82 |
|
83 |
+
# Login to Hugging Face using the environment variable token
|
84 |
HF_TOKEN = os.environ.get("HF_TOKEN")
|
85 |
if not HF_TOKEN:
|
86 |
st.error("Hugging Face token not found. Please set 'HF_TOKEN' as an environment variable.")
|
|
|
319 |
|
320 |
### Training Procedure
|
321 |
|
322 |
+
#### Preprocessing
|
323 |
|
324 |
* Images were preprocessed using PaddleOCR.
|
325 |
* Bounding boxes normalized to 1000-scale.
|
|
|
329 |
|
330 |
* **Training regime:** fp16 mixed precision
|
331 |
* **Epochs:** 20
|
332 |
+
* **Batch size:** 1
|
333 |
* **Learning rate:** 5e-5
|
334 |
|
335 |
+
#### Speeds, Sizes, Times
|
336 |
|
337 |
* **Checkpoint size:** \~435 MB
|
338 |
* **Training time:** \~2 hours on RTX 3060
|
|
|
366 |
|
367 |
* PyTorch, Hugging Face Transformers, PaddleOCR, Streamlit
|
368 |
|
369 |
+
## Citation
|
370 |
|
371 |
**BibTeX:**
|
372 |
|
|
|
378 |
howpublished = {\url{https://huggingface.co/parthesh111/layoutlmv3-finetune-bioes-new}},
|
379 |
}
|
380 |
```
|
381 |
+
## Glossary
|
382 |
|
383 |
* **BIOES:** Beginning, Inside, Outside, End, Single tagging scheme used for NER.
|
384 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
385 |
## Model Card Contact
|
386 |
|
387 |
* **GitHub/HF:** [parthesh111](https://huggingface.co/parthesh111)
|