Update README.md

1dad09f verified 2 days ago

5.54 kB

	---
	language:
	- en
	pipeline_tag: text-generation
	---

	<p align="center">
	<img src="./Bespoke-Labs-Logo.png" width="550">
	</p>

	# Bespoke-MiniChart-7B

	<a href="https://playground.bespokelabs.ai/minichart">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/g-QaXrmPLYk5m3Hq5vFtr.png" width="200px" />
	</a>

	This is an open‑source chart‑understanding Vision‑Language Model (VLM) developed at [Bespoke Labs](https://www.bespokelabs.ai/) and maintained by [Liyan Tang](https://www.tangliyan.com/) and Bespoke Labs. It sets a new state‑of‑the‑art in chart question‑answering (Chart‑QA) for 7 billion‑parameter models, outperforming much larger closed models such as Gemini‑1.5‑Pro and Claude‑3.5 on seven public benchmarks.

	1. Blog Post: https://www.bespokelabs.ai/blog/bespoke-minichart-7b
	2. Playground: https://playground.bespokelabs.ai/minichart
	---

	# Example Outputs

	The examples below showcase how Bespoke-MiniChart-7B can perform both visual perception and textual reasoning.


	<p align="left">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/E5WGhi_fVNzCsrKeNeIs3.png" width="700">
	</p>

	<p align="left">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/bYKXRm3sfOdX3zd_5qUpK.png" width="700">
	</p>


	# Model Performance

	Bespoke-MiniChart-7B achieves state-of-the-art performance on chart understanding among models with similar sizes. In addition to that, the model can even surpass closed-models such as Gemini-1.5-Pro and Claude-3.5.

	<p align="left">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/5pejAyzPG_tRBU6FwH7PA.png" width="700">
	</p>

	We also compare the performance of our model finetuned using SFT+DPO vs SFT only.

	In the table below, M1 and M2 are finetuned models with 270K and 1M SFT examples respsectively, and Bespoke-MiniChart-7B is the model finetuned using SFT+DPO.

	<p align="left">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6444e4417a7b94ddc2d14e1d/WRsPs437niUrXmYtkRajG.png" width="700">
	</p>


	# Model Use:

	[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1FEmlwGgn9209iQO-rs2-9UHPLoytwZMH?usp=sharing)

	The model is available on the playground here: https://playground.bespokelabs.ai/minichart

	You can also run the model with the following snippet:

	```python
	import requests
	from PIL import Image
	from io import BytesIO
	import base64
	import matplotlib.pyplot as plt
	from vllm import LLM, SamplingParams

	QA_PROMPT = """Please answer the question using the chart image.

	Question: [QUESTION]

	Please first generate your reasoning process and then provide the user with the answer. Use the following format:

	<think>
	... your thinking process here ...
	</think>
	<answer>
	... your final answer (entity(s) or number) ...
	</answer>"""

	def get_image_from_url(image_url):
	try:
	response = requests.get(image_url, stream=True)
	response.raise_for_status()
	return Image.open(BytesIO(response.content))
	except Exception as e:
	print(f"Error with image: {e}")
	return None

	def get_answer(image_url, question, display=True):
	image = get_image_from_url(image_url)

	if display:
	plt.figure(figsize=(10, 8))
	plt.imshow(image)
	plt.axis('off')
	plt.show()

	if not image:
	return "Error downloading image"

	buffered = BytesIO()
	image.save(buffered, format=image.format or 'JPEG')
	encoded_image = base64.b64encode(buffered.getvalue()).decode('utf-8')

	messages = [{
	"role": "user",
	"content": [
	{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}},
	{"type": "text", "text": QA_PROMPT.replace("[QUESTION]", question)}
	]
	}]

	response = llm.chat([messages], sampling_params=SamplingParams(temperature=0, max_tokens=500))
	return response[0].outputs[0].text

	# Initialize the LLM
	llm = LLM(
	model="bespokelabs/Bespoke-MiniChart-7B",
	tokenizer_mode="auto",
	max_model_len=15000,
	tensor_parallel_size=1,
	gpu_memory_utilization=0.9,
	mm_processor_kwargs={"max_pixels": 16002828},
	seed=2025,
	trust_remote_code=True,
	)

	# Running inference
	image_url = "https://github.com/bespokelabsai/minichart-playground-examples/blob/main/images/ilyc9wk4jf8b1.png?raw=true"
	question = "How many global regions maintained their startup funding losses below 30% in 2022?"

	print("\n\n=================Model Output:===============\n\n", get_answer(image_url, question))
	```

	---
	# Licence

	This work is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
	For commercial licensing, please contact [email protected].

	# Citation

	```
	@misc{bespoke_minichart_7b,
	title = {Bespoke-MiniChart-7B: pushing the frontiers of open VLMs for chart understanding},
	author = {Liyan Tang and Shreyas Pimpalgaonkar and Kartik Sharma and Alexandros G. Dimakis and Mahesh Sathiamoorthy and Greg Durrett},
	howpublished = {blog post},
	year = {2025},
	url={https://huggingface.co/bespokelabs/Bespoke-MiniChart-7B},
	}
	```

	# Acknowledgements

	Bespoke Labs team:

	- Liyan Tang
	- Shreyas Pimpalgaonkar
	- Kartik Sharma
	- Alex Dimakis
	- Mahesh Sathiamoorthy
	- Greg Durrett


	Model perfected at Bespoke Labs — where careful curation meets cutting‑edge modeling.