prithivMLmods commited on
Commit
820f3a9
·
verified ·
1 Parent(s): 9b1ece9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -10,9 +10,11 @@ tags:
10
  - text-generation-inference
11
  ---
12
  ![9.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Q9thNwcmuwMvot0uvIQtj.png)
 
13
  # **Blazer.1-7B-Vision**
14
 
15
- Blazer.1-7B-Vision `4-bit precision` is based on the Qwen2-VL model, fine-tuned for raw document annotation extraction, optical character recognition (OCR), and solving math problems with LaTeX formatting. This model integrates a conversational approach with advanced visual and textual understanding to effectively handle multi-modal tasks. Key enhancements include state-of-the-art (SoTA) performance in understanding images of various resolutions and aspect ratios, as demonstrated by its success on visual understanding benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA. Additionally, it excels in video comprehension, capable of processing videos over 20 minutes in length for high-quality video-based question answering, dialogue, and content creation. Blazer.1-2B-Vision also functions as an intelligent agent capable of operating devices like mobile phones and robots, thanks to its complex reasoning and decision-making abilities, enabling automatic operations based on visual environments and text instructions. To serve global users, the model offers multilingual support, understanding texts in a wide range of languages, including English, Chinese, most European languages, Japanese, Korean, Arabic, and Vietnamese.
 
16
 
17
  # **Use it With Transformer**
18
 
 
10
  - text-generation-inference
11
  ---
12
  ![9.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Q9thNwcmuwMvot0uvIQtj.png)
13
+
14
  # **Blazer.1-7B-Vision**
15
 
16
+ Blazer.1-7B-Vision `4-bit precision` is based on the Qwen2-VL model, fine-tuned for raw document annotation extraction, optical character recognition (OCR), and solving math problems with LaTeX formatting. This model integrates a conversational approach with advanced visual and textual understanding to effectively handle multi-modal tasks. Key enhancements include state-of-the-art (SoTA) performance in understanding images of various resolutions and aspect ratios, as demonstrated by its success on visual
17
+ understanding benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA. Additionally, it excels in video comprehension, capable of processing videos over 20 minutes in length for high-quality video-based question answering, dialogue, and content creation. Blazer.1-7B-Vision also functions as an intelligent agent capable of operating devices like mobile phones and robots, thanks to its complex reasoning and decision-making abilities, enabling automatic operations based on visual environments and text instructions. To serve global users, the model offers multilingual support, understanding texts in a wide range of languages, including English, Chinese, most European languages, Japanese, Korean, Arabic, and Vietnamese.
18
 
19
  # **Use it With Transformer**
20