license: mit | |
datasets: | |
- ILSVRC/imagenet-1k | |
model-index: | |
- name: Taming-VQGAN | |
results: | |
- task: | |
type: image-generation | |
dataset: | |
name: ILSVRC/imagenet-1k | |
type: ILSVRC/imagenet-1k | |
metrics: | |
- name: rFID | |
type: rFID | |
value: 7.96 | |
- name: InceptionScore | |
type: InceptionScore | |
value: 115.9 | |
- name: LPIPS | |
type: LPIPS | |
value: 0.306 | |
- name: PSNR | |
type: PSNR | |
value: 20.2 | |
- name: SSIM | |
type: SSIM | |
value: 0.52 | |
- name: CodebookUsage | |
type: CodebookUsage | |
value: 0.445 | |
This model is the Taming VQGAN tokenizer with a vocabulary size of 10bits converted into a format for the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256. | |
You can find more details on the VQGAN in the original [repository](https://github.com/CompVis/taming-transformers) or [paper](https://arxiv.org/abs/2012.09841). All credits for this model belong to Patrick Esser, Robin Rombach and Björn Ommer. |