LaZSL / README.md
KimingChen's picture
Update README.md
92f5feb verified
metadata
license: apache-2.0
datasets:
  - ILSVRC/imagenet-1k
  - bentrevett/caltech-ucsd-birds-200-2011
  - vaishaal/ImageNetV2
  - clip-benchmark/wds_imagenet_sketch
  - clip-benchmark/wds_imagenet-r
  - enterprise-explorers/oxford-pets
  - ethz/food101
  - clip-benchmark/wds_imagenet-a
language:
  - en
metrics:
  - accuracy
base_model:
  - openai/clip-vit-large-patch14
  - openai/clip-vit-base-patch32
pipeline_tag: zero-shot-image-classification
tags:
  - code

LaZSL

This repository contains the code for the ICCV'25 paper titled with "Intrepretable Zero-Shot Learning with Locally-Aligned Vision-Language Model".

Pre-print version at [arXiv]

Requirements

First install the dependencies.

Either manually:

conda install pytorch torchvision -c pytorch
conda install matplotlib torchmetrics -c conda-forge

Preparing Dataset

Please follow the instructions DATASETS.md to construct the datasets.

Running

To reproduce accuracy results from the paper: edit the directories to match your local machine in load_OP.py and set hparams['dataset'] accordingly. Then simply run python main_OP.py. All hyperparameters can be modified in load_OP.py.

Results

Results of our released models using various evaluation protocols on 6 datasets.

Dataset Acc(ViT-B/32) Acc(ViT-B/16) Acc(ViT-L/14)
Imagenet 65.3 69.2 75.7
CUB 56.5 60.3 66.1
OxfordPets 84.7 87.4 92.7
Food101 85.9 89.7 93.5
Place365 41.5 42.0 41.8

Citation

If you find LaZSL is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@inproceedings{chen2025interpretable,
  title={Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model},
  author={Chen, Shiming and Duan, Bowen and Khan, Salman and Khan, Fahad Shahbaz},
  booktitle={ICCV}
  year={2025}
}