File size: 1,179 Bytes
345a906 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: mit
datasets:
- ILSVRC/imagenet-1k
model-index:
- name: Taming-VQGAN
results:
- task:
type: image-generation
dataset:
name: ILSVRC/imagenet-1k
type: ILSVRC/imagenet-1k
metrics:
- name: rFID
type: rFID
value: 7.96
- name: InceptionScore
type: InceptionScore
value: 115.9
- name: LPIPS
type: LPIPS
value: 0.306
- name: PSNR
type: PSNR
value: 20.2
- name: SSIM
type: SSIM
value: 0.52
- name: CodebookUsage
type: CodebookUsage
value: 0.445
---
This model is the Taming VQGAN tokenizer with a vocabulary size of 10bits converted into a format for the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256.
You can find more details on the VQGAN in the original [repository](https://github.com/CompVis/taming-transformers) or [paper](https://arxiv.org/abs/2012.09841). All credits for this model belong to Patrick Esser, Robin Rombach and Björn Ommer. |