Cephalo-Gemma-3-4b
This checkpoint is more heavily fine-tuned with the biological materials and spider silk data set than lamm-mit/Cephalo-Gemma-3-4b-it-04-15-2025
.
Load model and do inference
import torch
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from transformers.image_utils import load_image
from PIL import Image as PILImage
ckpt = "lamm-mit/Cephalo-Gemma-3-4b-it-04-16-2025"
model = Gemma3ForConditionalGeneration.from_pretrained(
ckpt, device_map="auto", torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained(ckpt)
image=PILImage.open(f'./spiderweb.png').convert("RGB")
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a materials scientist."}
],
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "What does this image show? Provide a detailed analysis."}
]
}
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device)
input_len = inputs["input_ids"].shape[-1]
generation = model.generate(**inputs, max_new_tokens=512, do_sample=False)
generation = generation[0][input_len:]
decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

Results:
The image shows a spider's web, which is a structure of silk, in a red-lit, glass-enclosed cube. The web is the result of a spider's natural behavior and is a complex, three-dimensional pattern. The cube, which is a 3D-printed structure, is the environment in which the spider has created the web. The red lighting and the glass enclosure are used to highlight the web and the cube, and the lighting and the cube's material (glass) are used to show the web's structure.
The spider's web is a natural and intricate design, and the cube is a man-made, 3D-printed structure. The image is a combination of the natural and the artificial, and the red lighting and the glass enclosure are used to show the web and the cube in a new and interesting way.
The image is a reminder of the beauty and complexity of the natural world and the possibilities of the artificial world. The spider's web is a natural and intricate design, and the cube is a man-made, 3D-printed structure. The image is a combination of the natural and the artificial, and the red lighting and the glass enclosure are used to show the web and the cube in a new and interesting way.
Reference
@article{Buehler_Cephalo_2024_journal,
title={Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design},
author={Markus J. Buehler},
journal={Advanced Functional Materials},
year={2024},
volume={34},
issue={49},
doi={2409531},
url={https://advanced.onlinelibrary.wiley.com/doi/full/10.1002/adfm.202409531}
}