weiwu-ww
/

LoTLIP-ViT-B-16-100M

Model card Files Files and versions Community

LoTLIP-ViT-B-16-100M / README.md

weiwu-ww's picture

Update README.md

72c3b9f verified 7 months ago

|

1.5 kB

	---
	license: cc-by-4.0
	---

	# Model Card for LoTLIP ViT-B/16

	## Model Details

	### Model Description

	LoTLIP ViT-B/16 model pre-trained on 100M scale dataset.



	### Direct Use

	Zero-shot long text-image retrieval, short text-image retrieval, and image classification, among others.


	## How to Get Started with the Model

	Use the [code](https://github.com/wuw2019/LoTLIP) to get started with the model.


	## Training Details

	### Training Data

	The models are trained with 100M scale dataset which contains long text-image pairs.


	## Evaluation

	Please refer to https://github.com/wuw2019/LoTLIP.

	### Testing Details

	#### Testing Data

	The testing is performed with [DCI](https://github.com/facebookresearch/DCI), [IIW](https://github.com/google/imageinwords/) and [ShareGPT4V](https://sharegpt4v.github.io/) for long text-image retrieval and ImageNet1k for classification.


	### Results

	\| Model \|Pre-training Data Scale \| DCI I2T \| DCI T2I\| IIW I2T \|IIW T2I\| SV-10k I2T \| SV-10k T2I \|
	\| :----: \| :----: \| :----: \| :----: \| :----: \| :----: \| :----: \| :----: \|
	\| LoTLIP-ViT-B-16 \| 100M \| 64.11\| 62.63\| 94.28 \| 92.65\| 88.40 \| 82.72 \|





	## Citation


	BibTeX:

	```bibtex
	@inproceedings{LoTLIP,
	title={LoTLIP: Improving Language-Image Pre-training for Long Text Understanding},
	author={Wu, Wei and Zheng, Kecheng and Ma, Shuailei and Lu, Fan and Guo, Yuxin and Zhang, Yifei and Chen, Wei and Guo, Qingpei and Shen, Yujun and Zheng-Jun, Zha},
	booktitle={arXiv},
	year={2024}
	}
	```