README.md · candra/blip-image-captioning-finetuned at main

metadata

datasets:
  - Dataseeds/DataSeeds.AI-Sample-Dataset-DSD
language:
  - en
pipeline_tag: image-text-to-text

🖼️ BLIP Image Captioning — Finetuned (`candra/blip-image-captioning-finetuned`)

This model is a BLIP (Bootstrapping Language-Image Pretraining) model fine-tuned for image captioning. It takes an image as input and generates a descriptive caption. Additionally, it can convert that caption into cleaned, hashtag-friendly keywords.

🔧 Model Details

Base model: Salesforce/blip-image-captioning-base
Task: Image Captioning

🧪 Example Usage

from transformers import AutoProcessor, BlipForConditionalGeneration
import torch
from PIL import Image

# Load model and processor
processor = AutoProcessor.from_pretrained("candra/blip-image-captioning-finetuned")
model = BlipForConditionalGeneration.from_pretrained("candra/blip-image-captioning-finetuned")

# Load image
image_path = "IMAGE.jpg"
image = Image.open(image_path).convert("RGB")

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Preprocess and generate caption
inputs = processor(images=image, return_tensors="pt")
pixel_values = inputs.pixel_values.to(device)

generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print("Caption:", generated_caption)

# Convert caption to hashtags
words = generated_caption.lower().split(", ")
unique_words = sorted(set(words))
hashtags = ["#" + word.replace(" ", "") for word in unique_words]
print("Hashtags:", " ".join(hashtags))

📥 Input

Image (RGB format, e.g., .jpg, .png)

📤 Output

Caption: A string describing the contents of the image.
Hashtags: A list of unique hashtags derived from the caption.

📌 Example

Input Image
Example Image

Generated Caption

animal, lion, mammal, wildlife, zoo, barrel, grass, backgound

Hashtags

#animal #lion #mammal #wildlife #zoo #barrel #grass #backgound