Cephalo-Gemma-3-4b

This checkpoint is more heavily fine-tuned with the biological materials and spider silk data set than lamm-mit/Cephalo-Gemma-3-4b-it-04-15-2025.

Load model and do inference

import torch
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from transformers.image_utils import load_image
from PIL import Image as PILImage

ckpt = "lamm-mit/Cephalo-Gemma-3-4b-it-04-16-2025"
model = Gemma3ForConditionalGeneration.from_pretrained(
    ckpt, device_map="auto", torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained(ckpt)

image=PILImage.open(f'./spiderweb.png').convert("RGB")
messages = [
    {
        "role": "system",
        "content": [
             {"type": "text", "text": "You are a materials scientist."}
        ],
        "role": "user",
        "content": [
             {"type": "image", "image": image},
             {"type": "text", "text": "What does this image show? Provide a detailed analysis."}
        ]
    }
]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device)

input_len = inputs["input_ids"].shape[-1]

generation = model.generate(**inputs, max_new_tokens=512, do_sample=False)
generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)
print(decoded)

image/jpeg

Results:

The image shows a spider's web, which is a structure of silk, in a red-lit, glass-enclosed cube. The web is the result of a spider's natural behavior and is a complex, three-dimensional pattern. The cube, which is a 3D-printed structure, is the environment in which the spider has created the web. The red lighting and the glass enclosure are used to highlight the web and the cube, and the lighting and the cube's material (glass) are used to show the web's structure.

The spider's web is a natural and intricate design, and the cube is a man-made, 3D-printed structure. The image is a combination of the natural and the artificial, and the red lighting and the glass enclosure are used to show the web and the cube in a new and interesting way.

The image is a reminder of the beauty and complexity of the natural world and the possibilities of the artificial world. The spider's web is a natural and intricate design, and the cube is a man-made, 3D-printed structure. The image is a combination of the natural and the artificial, and the red lighting and the glass enclosure are used to show the web and the cube in a new and interesting way.

Reference

@article{Buehler_Cephalo_2024_journal,
  title={Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design},
  author={Markus J. Buehler},
  journal={Advanced Functional Materials},
  year={2024},
  volume={34},
  issue={49},
  doi={2409531},
  url={https://advanced.onlinelibrary.wiley.com/doi/full/10.1002/adfm.202409531}
}
Downloads last month
17
Safetensors
Model size
4.97B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including lamm-mit/Cephalo-Gemma-3-4b-it-04-16-2025