czczup commited on
Commit
7de506b
Β·
verified Β·
1 Parent(s): f40d456

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -21
README.md CHANGED
@@ -10,25 +10,12 @@ datasets:
10
  pipeline_tag: image-feature-extraction
11
  ---
12
 
13
- # Model Card for InternVL-14B-224px
14
-
15
- <p align="center">
16
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/2yzk5wUY-obL6H4rKiHlU.webp" alt="Image Description" width="300" height="300">
17
- </p>
18
 
19
  [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
20
 
21
  [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#model-usage) [\[🌐 Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[πŸ“– δΈ­ζ–‡θ§£θ―»\]](https://zhuanlan.zhihu.com/p/675877376)
22
 
23
- | Model | Date | Download | Note |
24
- | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
25
- | InternViT-6B-448px-V1-5 | 2024.04.20 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (πŸ”₯new) |
26
- | InternViT-6B-448px-V1-2 | 2024.02.11 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution |
27
- | InternViT-6B-448px-V1-0 | 2024.01.30 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution |
28
- | InternViT-6B-224px | 2023.12.22 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px) | vision foundation model |
29
- | InternVL-14B-224px | 2023.12.22 | πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px) | vision-language foundation model |
30
-
31
-
32
  ## Model Details
33
  - **Model Type:** vision-language foundation model
34
  - **Support Tasks:** zero-shot image/video classification, image-text/video retrieval, image captioning
@@ -43,10 +30,8 @@ See this [document](https://github.com/OpenGVLab/InternVL/tree/main/clip_benchma
43
 
44
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/KfsrXioPU77T48sRb60oL.png)
45
 
46
-
47
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/q5UkfrEix6w3mnn_1w4ja.png)
48
 
49
-
50
  ## Model Usage
51
 
52
  **Note: the prefix `'summarize:'` and `tokenizer.pad_token_id = 0` are necessary. Their absence will lead to abnormal results.**
@@ -141,8 +126,3 @@ If you find this project useful in your research, please consider citing:
141
  year={2024}
142
  }
143
  ```
144
-
145
-
146
- ## Acknowledgement
147
-
148
- InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
 
10
  pipeline_tag: image-feature-extraction
11
  ---
12
 
13
+ # InternVL-14B-224px
 
 
 
 
14
 
15
  [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
16
 
17
  [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#model-usage) [\[🌐 Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[πŸ“– δΈ­ζ–‡θ§£θ―»\]](https://zhuanlan.zhihu.com/p/675877376)
18
 
 
 
 
 
 
 
 
 
 
19
  ## Model Details
20
  - **Model Type:** vision-language foundation model
21
  - **Support Tasks:** zero-shot image/video classification, image-text/video retrieval, image captioning
 
30
 
31
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/KfsrXioPU77T48sRb60oL.png)
32
 
 
33
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/q5UkfrEix6w3mnn_1w4ja.png)
34
 
 
35
  ## Model Usage
36
 
37
  **Note: the prefix `'summarize:'` and `tokenizer.pad_token_id = 0` are necessary. Their absence will lead to abnormal results.**
 
126
  year={2024}
127
  }
128
  ```