Friday-VLM
Friday-VLM is a multimodal (image + text) LLM fine-tuned on image and text instruction data.
The architecture and config live in this repo, so callers must load the model with
trust_remote_code=True
.
Model variants
Repo ID | Precision | File format | Typical VRAM* | Size on disk |
---|---|---|---|---|
kevin510/friday |
bf16 (full) | safetensors |
100 % | 100 % |
kevin510/friday-fp4 |
fp4 (bitsandbytes int4) | safetensors |
≈ 30 % | ≈ 25 % |
Dependencies
conda create --name friday python=3.12 -y
conda activate friday
pip install transformers torch torchvision deepspeed accelerate pillow einops timm
Quick start
import torch
from PIL import Image
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.utils import logging
tok = AutoTokenizer.from_pretrained("kevin510/friday", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"kevin510/friday",
trust_remote_code=True,
device_map="auto"
)
model.eval()
prompt = "Describe this image."
user_prompt = f"<|user|><image>\n{prompt}\n<|assistant|>"
inputs = tok(user_prompt, return_tensors="pt").to(model.device)
image = Image.open("my_image.jpg").convert("RGB")
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
images=[image]
)
print(tok.decode(out[0], skip_special_tokens=False))
Architecture at a glance
FastViT-HD ─▶ 3072-d patch embeddings ─▶ S2 6144-d patch embeddings ─▶ 2-layer MLP vision-adapter (6144 → 3072)
(vision tokens, 3072 d) ─┐
├─► Φ-4-mini-reasoning (2.7 B params, hidden = 3072)
<text tokens, 3072 d> ───┘ │
│ (standard self-attention only;
│ language tower is frozen at finetune)
Limitations & Responsible AI
Friday-VLM may hallucinate objects, invent facts, or reproduce societal biases. All variants share the same behaviour profile; quantisation does not filter or sanitise model outputs. Users must apply their own content-safety layer before deployment.
Citation
@misc{friday2025,
title = {Friday VLM: Efficient Instruction-Tuned Vision–Language Modelling},
author = {Your Name et al.},
year = {2025},
url = {https://huggingface.co/kevin510/friday}
}
- Downloads last month
- 154
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for kevin510/friday
Base model
kevin510/fast-vit-hd