Post
195
ByteDance released Tar 1.5B and 7B: image-text in image-text out models, fully open-source 👏
ByteDance-Seed/tar-6864cf0d9fe59a3b91cc4260
They have an image tokenizer unified with text, and they de-tokenize using either of two models (LLM and diffusion)
The model is actually a full LLM (Qwen2), the tokenizer converts image tokens 🤯
They have an image tokenizer unified with text, and they de-tokenize using either of two models (LLM and diffusion)
The model is actually a full LLM (Qwen2), the tokenizer converts image tokens 🤯