Friday-VLM

Friday-VLM is a multimodal (image + text) LLM fine-tuned on image and text instruction data. The architecture and config live in this repo, so callers must load the model with trust_remote_code=True.


Model variants

Repo ID Precision File format Typical VRAM* Size on disk
kevin510/friday bf16 (full) safetensors 100 % 100 %
kevin510/friday-fp4 fp4 (bitsandbytes int4) safetensors ≈ 30 % ≈ 25 %

Dependencies

conda create --name friday python=3.12 -y
conda activate friday
pip install transformers torch torchvision  deepspeed accelerate pillow einops timm

Quick start

import torch
from PIL import Image
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.utils import logging

tok = AutoTokenizer.from_pretrained("kevin510/friday", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "kevin510/friday",
    trust_remote_code=True,
    device_map="auto" 
)
model.eval()

prompt = "Describe this image."
user_prompt = f"<|user|><image>\n{prompt}\n<|assistant|>"
inputs = tok(user_prompt, return_tensors="pt").to(model.device)

image = Image.open("my_image.jpg").convert("RGB")

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        images=[image]
    )

print(tok.decode(out[0], skip_special_tokens=False))

Architecture at a glance

FastViT-HD ─▶ 3072-d patch embeddings ─▶ S2 6144-d patch embeddings ─▶  2-layer MLP vision-adapter (6144 → 3072)

(vision tokens, 3072 d) ─┐
├─► Φ-4-mini-reasoning (2.7 B params, hidden = 3072)
<text tokens, 3072 d> ───┘ │
│ (standard self-attention only;
│ language tower is frozen at finetune)

Limitations & Responsible AI

Friday-VLM may hallucinate objects, invent facts, or reproduce societal biases. All variants share the same behaviour profile; quantisation does not filter or sanitise model outputs. Users must apply their own content-safety layer before deployment.

Citation

@misc{friday2025,
  title   = {Friday VLM: Efficient Instruction-Tuned Vision–Language Modelling},
  author  = {Your Name et al.},
  year    = {2025},
  url     = {https://huggingface.co/kevin510/friday}
}
Downloads last month
154
Safetensors
Model size
4B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kevin510/friday

Finetuned
(1)
this model

Datasets used to train kevin510/friday