Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
2
7
307
Jeff Cook
jeffcookio
Follow
Mi6paulino's profile picture
GoHugo's profile picture
21world's profile picture
4 followers
ยท
134 following
sjuxax
AI & ML interests
None yet
Recent Activity
liked
a model
6 days ago
google/gemma-3-12b-it-qat-int4-unquantized
reacted
to
m-ric
's
post
with ๐
6 days ago
New king of open VLMs: InternVL3 takes Qwen 2.5's crown! ๐ InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline. โก๏ธ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential ๐ข : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together. ๐ซ The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? โค๏ธ), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase ๐จ. They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. ๐
updated
a model
6 days ago
jeffcookio/granite-3.3-8b-instruct-gptqmodel-4b-64g
View all activity
Organizations
None yet
models
5
Sort:ย Recently updated
jeffcookio/granite-3.3-8b-instruct-gptqmodel-4b-64g
Updated
6 days ago
โข
70
jeffcookio/granite-3.2-8b-instruct-gptqmodel-4b-64g
Updated
8 days ago
โข
2
jeffcookio/gemma-3-12b-it-abliterated-gptqmodel-4b-128g
Updated
10 days ago
โข
31
jeffcookio/Mistral-Small-3.1-24B-Instruct-2503-HF-gptqmodel-4b-128g
Updated
Mar 21
โข
101
โข
2
jeffcookio/Qwen2.5-VL-7B-Instruct-W4A16-G128
Updated
Mar 8
โข
3
datasets
None public yet