|
--- |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
<p align="center"> |
|
<img src="./Bespoke-Labs-Logo.png" width="550"> |
|
</p> |
|
|
|
# Bespoke-MiniChart-7B |
|
|
|
<a href="https://playground.bespokelabs.ai/minichart"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/g-QaXrmPLYk5m3Hq5vFtr.png" width="200px" /> |
|
</a> |
|
|
|
This is an open‑source chart‑understanding Vision‑Language Model (VLM) developed at [Bespoke Labs](https://www.bespokelabs.ai/) and maintained by [Liyan Tang](https://www.tangliyan.com/) and Bespoke Labs. It sets a new state‑of‑the‑art in chart question‑answering (Chart‑QA) for 7 billion‑parameter models, outperforming much larger closed models such as Gemini‑1.5‑Pro and Claude‑3.5 on seven public benchmarks. |
|
|
|
1. **Blog Post**: https://www.bespokelabs.ai/blog/bespoke-minichart-7b |
|
2. **Playground**: https://playground.bespokelabs.ai/minichart |
|
--- |
|
|
|
# Example Outputs |
|
|
|
The examples below showcase how Bespoke-MiniChart-7B can perform both visual perception and textual reasoning. |
|
|
|
|
|
<p align="left"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/E5WGhi_fVNzCsrKeNeIs3.png" width="700"> |
|
</p> |
|
|
|
<p align="left"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/bYKXRm3sfOdX3zd_5qUpK.png" width="700"> |
|
</p> |
|
|
|
|
|
# Model Performance |
|
|
|
Bespoke-MiniChart-7B achieves state-of-the-art performance on chart understanding among models with similar sizes. In addition to that, the model can even surpass closed-models such as Gemini-1.5-Pro and Claude-3.5. |
|
|
|
<p align="left"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/5pejAyzPG_tRBU6FwH7PA.png" width="700"> |
|
</p> |
|
|
|
We also compare the performance of our model finetuned using SFT+DPO vs SFT only. |
|
|
|
In the table below, M1 and M2 are finetuned models with 270K and 1M SFT examples respsectively, and Bespoke-MiniChart-7B is the model finetuned using SFT+DPO. |
|
|
|
<p align="left"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/WRsPs437niUrXmYtkRajG.png" width="700"> |
|
</p> |
|
|
|
|
|
# Model Use: |
|
|
|
[](https://colab.research.google.com/drive/1FEmlwGgn9209iQO-rs2-9UHPLoytwZMH?usp=sharing) |
|
|
|
The model is available on the playground here: https://playground.bespokelabs.ai/minichart |
|
|
|
You can also run the model with the following snippet: |
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
from io import BytesIO |
|
import base64 |
|
import matplotlib.pyplot as plt |
|
from vllm import LLM, SamplingParams |
|
|
|
QA_PROMPT = """Please answer the question using the chart image. |
|
|
|
Question: [QUESTION] |
|
|
|
Please first generate your reasoning process and then provide the user with the answer. Use the following format: |
|
|
|
<think> |
|
... your thinking process here ... |
|
</think> |
|
<answer> |
|
... your final answer (entity(s) or number) ... |
|
</answer>""" |
|
|
|
def get_image_from_url(image_url): |
|
try: |
|
response = requests.get(image_url, stream=True) |
|
response.raise_for_status() |
|
return Image.open(BytesIO(response.content)) |
|
except Exception as e: |
|
print(f"Error with image: {e}") |
|
return None |
|
|
|
def get_answer(image_url, question, display=True): |
|
image = get_image_from_url(image_url) |
|
|
|
if display: |
|
plt.figure(figsize=(10, 8)) |
|
plt.imshow(image) |
|
plt.axis('off') |
|
plt.show() |
|
|
|
if not image: |
|
return "Error downloading image" |
|
|
|
buffered = BytesIO() |
|
image.save(buffered, format=image.format or 'JPEG') |
|
encoded_image = base64.b64encode(buffered.getvalue()).decode('utf-8') |
|
|
|
messages = [{ |
|
"role": "user", |
|
"content": [ |
|
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}}, |
|
{"type": "text", "text": QA_PROMPT.replace("[QUESTION]", question)} |
|
] |
|
}] |
|
|
|
response = llm.chat([messages], sampling_params=SamplingParams(temperature=0, max_tokens=500)) |
|
return response[0].outputs[0].text |
|
|
|
# Initialize the LLM |
|
llm = LLM( |
|
model="bespokelabs/Bespoke-MiniChart-7B", |
|
tokenizer_mode="auto", |
|
max_model_len=15000, |
|
tensor_parallel_size=1, |
|
gpu_memory_utilization=0.9, |
|
mm_processor_kwargs={"max_pixels": 1600*28*28}, |
|
seed=2025, |
|
trust_remote_code=True, |
|
) |
|
|
|
# Running inference |
|
image_url = "https://github.com/bespokelabsai/minichart-playground-examples/blob/main/images/ilyc9wk4jf8b1.png?raw=true" |
|
question = "How many global regions maintained their startup funding losses below 30% in 2022?" |
|
|
|
print("\n\n=================Model Output:===============\n\n", get_answer(image_url, question)) |
|
``` |
|
|
|
--- |
|
# Licence |
|
|
|
This work is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). |
|
For commercial licensing, please contact [email protected]. |
|
|
|
# Citation |
|
|
|
``` |
|
@misc{bespoke_minichart_7b, |
|
title = {Bespoke-MiniChart-7B: pushing the frontiers of open VLMs for chart understanding}, |
|
author = {Liyan Tang and Shreyas Pimpalgaonkar and Kartik Sharma and Alexandros G. Dimakis and Mahesh Sathiamoorthy and Greg Durrett}, |
|
howpublished = {blog post}, |
|
year = {2025}, |
|
url={https://huggingface.co/bespokelabs/Bespoke-MiniChart-7B}, |
|
} |
|
``` |
|
|
|
# Acknowledgements |
|
|
|
**Bespoke Labs** team: |
|
|
|
- Liyan Tang |
|
- Shreyas Pimpalgaonkar |
|
- Kartik Sharma |
|
- Alex Dimakis |
|
- Mahesh Sathiamoorthy |
|
- Greg Durrett |
|
|
|
|
|
*Model perfected at Bespoke Labs — where careful curation meets cutting‑edge modeling.* |