---
license: apache-2.0
language:
- en
base_model:
- prithivMLmods/docscopeOCR-7B-050425-exp
tags:
- DREX
- Document
- DocScope
- text-generation-inference
- Markdown Live Preview
datasets:
- prithivMLmods/Openpdf-Analysis-Recognition
pipeline_tag: image-text-to-text
library_name: transformers
---

![DREX-062225-exp.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/u_GfrI307c69Hc4Ku3iHs.png)

# **DREX-062225-exp**

> The **DREX-062225-exp** (**Document Retrieval and Extraction eXpert**) model is a specialized fine-tuned version of **docscopeOCR-7B-050425-exp**, optimized for **Document Retrieval**, **Content Extraction**, and **Analysis Recognition**. Built on top of the Qwen2.5-VL architecture, this model enhances document comprehension capabilities with focused training on the Opendoc2-Analysis-Recognition dataset for superior document analysis and information extraction tasks.


> [!note]
DREX: Document Retrieval and Extraction eXpert [ experimental ]

# Key Enhancements

* **Advanced Document Retrieval**: Specialized capabilities for locating and retrieving specific information from complex document structures and layouts.

* **Enhanced Content Extraction**: Optimized for extracting structured data, key information, and relevant content from diverse document types including reports, forms, and technical documentation.

* **Superior Analysis Recognition**: Fine-tuned recognition abilities for document analysis tasks, pattern identification, and contextual understanding of document hierarchies.

* **Inherited OCR Excellence**: Maintains all advanced OCR capabilities from the base docscopeOCR model including mathematical LaTeX formatting and multi-language support.

* **Document-Centric Understanding**: Specialized training for understanding document relationships, cross-references, and contextual dependencies within complex document sets.

---

# Markdown (.MD) - Inference

![1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/MbQ4l2xsMD3kUppqHC1_H.png)

---

![2.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/k_N9NppbakBo4iJM7LJnR.png)

---

# Quick Start with Transformers

```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/DREX-062225-exp", torch_dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained("prithivMLmods/DREX-062225-exp")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Extract and analyze the key information from this document."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
```

---

## Training Details

| Parameter               | Value                                               |
|-------------------------|-----------------------------------------------------|
| **Dataset**             | Opendoc2-Analysis-Recognition                       |
| **Dataset Size**        | 6,910 samples                                       |
| **Base Model**          | docscopeOCR-7B-050425-exp                          |
| **Model Architecture**  | `Qwen2_5_VLForConditionalGeneration`                |
| **Hardware**            | 2 × A40 (19 vCPUs)                                 |
| **Total Disk**          | 280,000 MB                                          |
| **Training Time**       | 3,407 seconds (~0.95 hours)                        |
| **Warmup Steps**        | 250                                                |
| **Precision**           | bfloat16                                            |

> [!note]
> This model builds upon the robust foundation of docscopeOCR-7B-050425-exp with specialized training for document retrieval and extraction tasks.

# Intended Use

This model is specifically designed for:

* **Document Retrieval**: Efficiently locating specific information within large document collections and complex layouts.
* **Content Extraction**: Precise extraction of structured data, tables, forms, and key information from various document types.
* **Analysis Recognition**: Advanced recognition and analysis of document patterns, structures, and contextual relationships.
* **Enterprise Document Processing**: Automated processing of business documents, reports, contracts, and administrative forms.
* **Research Document Analysis**: Academic paper analysis, citation extraction, and research document comprehension.
* **Regulatory Compliance**: Processing of compliance documents, regulatory filings, and standardized reporting formats.

# Limitations

* Inherits computational requirements from the base docscopeOCR model, requiring substantial resources for optimal performance.
* Performance may vary on document types significantly different from the Opendoc2-Analysis-Recognition training dataset.
* May show reduced accuracy on extremely specialized or domain-specific document formats not covered in training.
* Long document processing requires adequate memory allocation and may not be suitable for real-time streaming applications.
* Optimal performance depends on proper visual token configuration and input preprocessing.

## References

- **Base Model**: docscopeOCR-7B-050425-exp
  [https://huggingface.co/prithivMLmods/docscopeOCR-7B-050425-exp](https://huggingface.co/prithivMLmods/docscopeOCR-7B-050425-exp)

- **DocVLM: Make Your VLM an Efficient Reader** 
  [https://arxiv.org/pdf/2412.08746v1](https://arxiv.org/pdf/2412.08746v1)

- **YaRN: Efficient Context Window Extension of Large Language Models**  
  [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)

- **Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution**  
  [https://arxiv.org/pdf/2409.12191](https://arxiv.org/pdf/2409.12191)

- **Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond**  
  [https://arxiv.org/pdf/2308.12966](https://arxiv.org/pdf/2308.12966)

- **A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy**
  [https://arxiv.org/pdf/2412.02210](https://arxiv.org/pdf/2412.02210)