lamm-mit
/

cephalo

mjbuehler commited on May 26, 2024

Commit

eb2e643

verified ·

1 Parent(s): dc99f54

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ A novel aspect of Cephalo's development is the innovative dataset generation met
 Cephalo can interpret complex visual scenes and generating contextually accurate language descriptions and answer queries.
-The model is developed to process diverse inputs, including images and text, facilitating a broad range of applications such as image captioning, visual question answering, and multimodal content generation. The architecture combines a vision encoder model and an autoregressive transformer to process complex natural language understanding.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/kl5GWBP9WS0D4uwd1t3S7.png)
@@ -26,6 +26,8 @@ Cephalo provides a robust framework for multimodal interaction and understanding
   - Trained on Idefics-2 distilled image-text data from Wikipedia and scientific papers. Gives shorter answers, to the point, and generaly accurate.
 - [Cephalo-Idefics-2-vision-8b-beta](https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta)
   - Trained on GPT-4o distilled image-text data from Wikipedia and scientific papers. Gives longer answers, with enhanced reasoning. Can struggle with complex concepts.
 ## Citation

 Cephalo can interpret complex visual scenes and generating contextually accurate language descriptions and answer queries.
+The models are developed to process diverse inputs, including images and text, facilitating a broad range of applications such as image captioning, visual question answering, and multimodal content generation. The architecture combines a vision encoder model and an autoregressive transformer to process complex natural language understanding.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/623ce1c6b66fedf374859fe7/kl5GWBP9WS0D4uwd1t3S7.png)
   - Trained on Idefics-2 distilled image-text data from Wikipedia and scientific papers. Gives shorter answers, to the point, and generaly accurate.
 - [Cephalo-Idefics-2-vision-8b-beta](https://huggingface.co/lamm-mit/Cephalo-Idefics-2-vision-8b-beta)
   - Trained on GPT-4o distilled image-text data from Wikipedia and scientific papers. Gives longer answers, with enhanced reasoning. Can struggle with complex concepts.
+- [Cephalo-Llava-v1.6-Mistral-8b-alpha](https://huggingface.co/lamm-mit/Cephalo-Llava-v1.6-Mistral-8b-alpha)
+  - Trained on GPT-4o distilled image-text data from Wikipedia and scientific papers.
 ## Citation