bespokelabs
/

Bespoke-MiniChart-7B

Text Generation

Safetensors

English

qwen2_5_vl

conversational

Model card Files Files and versions Community

pimpalgaonkar commited on Apr 23

Commit

d5866e5

verified ·

1 Parent(s): 5dc2230

Update README.md

Browse files

Files changed (1) hide show

README.md +147 -196

README.md CHANGED Viewed

@@ -1,199 +1,150 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+language:
+- en
+pipeline_tag: text-generation
 ---
+<p align="center">
+    <img src="./Bespoke-Labs-Logo.png" width="550">
+</p>
+# Bespoke-MiniChart-7B
+[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1FEmlwGgn9209iQO-rs2-9UHPLoytwZMH?usp=sharing)
+This is an open‑source chart‑understanding Vision‑Language Model (VLM) developed at **Bespoke Labs** and maintained by **Liyan Tang** and **Bespoke Labs**. It sets a new state‑of‑the‑art in chart question‑answering (Chart‑QA) for 7 billion‑parameter models, outperforming much larger closed models such as Gemini‑1.5‑Pro and Claude‑3.5 on seven public benchmarks.
+Please check our blog for more information about how we trained the model <Blog Post Link>
+# Model Performance
+Our model achieves state-of-the-art performance on chart understanding among models with similar sizes. In addition to that, our models can even surpass closed-models such as Gemini-1.5-Pro and Claude-3.5.
+| Model / Category                       | ChartQAPro (1637) | ChartQA (2500) | EvoChart (1250) | CharXiv (4000) | ChartX (1152) | ChartBench (2100) | MMC (808) | Average |
+|----------------------------------------|------------------:|---------------:|----------------:|---------------:|--------------:|------------------:|----------:|--------:|
+| **Open-Models&nbsp;(11 B and less)**   |                   |                |                 |                |               |                   |           |         |
+| InternVL-2.5-8B                        |                  –| **78.2**       | 53.0            | 55.7           | 49.5          | 44.7              | **85.5**  | –       |
+| Qwen2-VL-7B                            |                  –| **82.1**       | 54.5            | 53.5           | 50.8          | 50.8              | **83.9**  | –       |
+| Qwen2.5-VL-7B                          | **53.5**          | **86.0**       | 67.9            | 60.9           | 67.0          | 61.4              | **86.0**  | **69.0**|
+| **Ours**                               |                   |                |                 |                |               |                   |           |         |
+| Bespoke-MiniChart-7B                   | **56.7**          | **89.5**       | **71.8**        | **66.4**       | **68.9**      | **66.1**          | **88.4**  | **72.5**|
+| **Open-Models&nbsp;(32 B and more)**   |                   |                |                 |                |               |                   |           |         |
+| QVQ-72B-Preview                        |                  –| **84.2**       | 65.0            | 59.0           | 60.9          | 53.8              | **83.4**  | –       |
+| Qwen2.5-VL-32B                         | **58.4**          | **89.5**       | 74.3            | 66.9           | 64.5          | 59.8              | **89.6**  | **71.9**|
+| Qwen2.5-VL-72B                         | **59.0**          | **90.0**       | **76.8**        | 67.1           | **67.2**      | 61.5              | **91.2**  | **73.3**|
+| **Closed-Models**                      |                   |                |                 |                |               |                   |           |         |
+| GPT-4o                                 | **53.6**          | 85.7           | 71.7            | 67.8           | 54.3          | 46.1              | **89.1**  | 66.9    |
+| Gemini-1.5-flash                       | **53.8**          | 85.6           | 67.5            | 67.7           | 63.5          | 58.1              | 82.1      | 68.3    |
+| Gemini-1.5-pro                         | **59.2**          | **89.0**       | 72.0            | **69.9**       | 65.4          | 62.4              | 87.9      | **72.3**|
+| Claude-3.5                             | **56.6**          | 85.7           | **78.1**        | **69.7**       | 64.7          | 60.9              | **89.9**  | **72.2**|
+| Claude-3.7                             | **63.0**          | 86.1           | **80.1**        | **69.7**       | **69.2**      | **65.0**          | 88.4      | **74.5**|
+# Model Use:
+```python
+import requests
+from PIL import Image
+from io import BytesIO
+import base64
+import matplotlib.pyplot as plt
+from vllm import LLM, SamplingParams
+QA_PROMPT = """Please answer the question using the chart image.
+Question: [QUESTION]
+Please first generate your reasoning process and then provide the user with the answer. Use the following format:
+<think>
+... your thinking process here ...
+</think>
+<answer>
+... your final answer (entity(s) or number) ...
+</answer>"""
+def get_image_from_url(image_url):
+    try:
+        response = requests.get(image_url, stream=True)
+        response.raise_for_status()
+        return Image.open(BytesIO(response.content))
+    except Exception as e:
+        print(f"Error with image: {e}")
+        return None
+def get_answer(image_url, question, display=True):
+    image = get_image_from_url(image_url)
+    if display:
+      plt.figure(figsize=(10, 8))
+      plt.imshow(image)
+      plt.axis('off')
+      plt.show()
+    if not image:
+        return "Error downloading image"
+    buffered = BytesIO()
+    image.save(buffered, format=image.format or 'JPEG')
+    encoded_image = base64.b64encode(buffered.getvalue()).decode('utf-8')
+    messages = [{
+        "role": "user",
+        "content": [
+            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}},
+            {"type": "text", "text": QA_PROMPT.replace("[QUESTION]", question)}
+        ]
+    }]
+    response = llm.chat([messages], sampling_params=SamplingParams(temperature=0, max_tokens=500))
+    return response[0].outputs[0].text
+# Initialize the LLM
+llm = LLM(
+    model="bespokelabs/Bespoke-MiniChart-7B",
+    tokenizer_mode="auto",
+    max_model_len=15000,
+    tensor_parallel_size=1,
+    gpu_memory_utilization=0.9,
+    mm_processor_kwargs={"max_pixels": 1600*28*28},
+    seed=2025,
+    trust_remote_code=True,
+)
+# Running inference
+image_url = "https://github.com/bespokelabsai/chartqa-examples/blob/main/images/ilyc9wk4jf8b1.png?raw=true"
+question = "How many global regions maintained their startup funding losses below 30% in 2022?"
+print("\n\n=================Model Output:===============\n\n", get_answer(image_url, question))
+```
+# Licence
+This work is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
+For commercial licensing, please contact company@bespokelabs.ai.
+# Citation
+```
+@misc{bespoke_minichart_7b,
+  title  = {Bespoke-MiniChart-7B: pushing the frontiers of open VLMs for chart understanding},
+  author = {Liyan Tang and Shreyas Pimpalgaonkar and Kartik Sharma and Alexandros G. Dimakis and Mahesh Sathiamoorthy and Greg Durrett},
+  howpublished = {blog post},
+  year   = {2025},
+  url={https://huggingface.co/bespokelabs/Bespoke-MiniChart-7B},
+}
+```
+# Acknowledgements
+**Bespoke Labs** team:
+- Liyan Tang
+- Shreyas Pimpalgaonkar
+- Kartik Sharma
+- Alex Dimakis
+- Mahesh Sathiamoorthy
+- Greg Durrett
+*Model perfected at Bespoke Labs — where careful curation meets cutting‑edge modeling.*