kulia-moon
/

jasVLM-nanoVLM

Image-Text-to-Text

vision-language

Model card Files Files and versions

kulia-moon commited on 2 days ago

Commit

f82d794

·

verified ·

1 Parent(s): 09bcd43

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -8,9 +8,8 @@ tags:
 - research
 - pytorch
 - vlm
-base_model:
-# - HuggingFaceTB/SmolLM2-135M
-- lusxvr/nanoVLM-222M
 ---
 **nanoVLM** is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model.
@@ -26,4 +25,7 @@ Follow the install instructions and run the following code:
 from models.vision_language_model import VisionLanguageModel
 model = VisionLanguageModel.from_pretrained("kulia-moon/jasVLM-nanoVLM")
-```

 - research
 - pytorch
 - vlm
+datasets:
+- HuggingFaceM4/the_cauldron
 ---
 **nanoVLM** is a minimal and lightweight Vision-Language Model (VLM) designed for efficient training and experimentation. Built using pure PyTorch, the entire model architecture and training logic fits within ~750 lines of code. It combines a ViT-based image encoder (SigLIP-B/16-224-85M) with a lightweight causal language model (SmolLM2-135M), resulting in a compact 222M parameter model.
 from models.vision_language_model import VisionLanguageModel
 model = VisionLanguageModel.from_pretrained("kulia-moon/jasVLM-nanoVLM")
+```
+# Evaluation
+![eval](https://cdn-uploads.huggingface.co/production/uploads/67fb3a09be94c007ddfde83a/ivLetBNo7F_7G5hjT3Qx9.png)