Object Relation Transformer

The Object Relation Transformer is a Transformer-based image captioning model. You can find more details about the model in our NeurIPS 2019 paper.

This model repository contains two variants of the Object Relation Transformer, as well as a couple of baseline models. Please find more details about all these models within the README of our Github repository.

Citation

If you find these models useful, please consider citing (no obligation at all):

@article{herdade2019image,
  title={Image Captioning: Transforming Objects into Words},
  author={Herdade, Simao and Kappeler, Armin and Boakye, Kofi and Soares, Joao},
  journal={arXiv preprint arXiv:1906.05963},
  year={2019}
}

Maintainers

License

The contents of this repository are (c) by Verizon Media.

The contents of this repository are licensed under a Creative Commons Attribution 4.0 International License.

You should have received a copy of the license along with this work. If not, see https://creativecommons.org/licenses/by/4.0/.

Downloads last month
6
Inference Examples
Inference API (serverless) has been turned off for this model.