This is great! Florence 2 is an amazing model for its size but it hallucinates a lot in terms of positioning and color, that's reduced with this model but I think it still persists
Yatharth Gupta
Warlord-K
AI & ML interests
Generative AI, Natural Language Processing, Reinforcement Learning
Recent Activity
updated
a Space
about 2 months ago
Warlord-K/AI-Teller
published
a Space
about 2 months ago
Warlord-K/AI-Teller
updated
a model
3 months ago
Alpha-VLLM/Lumina-Image-2.0
Organizations
Warlord-K's activity

replied to
gokaygokay's
post
10 months ago

replied to
gokaygokay's
post
10 months ago
What are your thoughts on Florence2? Do you think finetuning it on these datasets will help on the captioning task?

reacted to
gokaygokay's
post with โค๏ธ
10 months ago
Post
5976
I've fine-tuned three types of PaliGemma image captioner models for generating prompts for Text2Image models. They generate captions similar to prompts we give to the image generation models. I used google/docci and google/imageinwords datasets for fine-tuning.
This one gives you longer captions.
gokaygokay/SD3-Long-Captioner
This one gives you middle size captions.
https://huggingface.co/spaces/gokaygokay/SD3-Long-Captioner-V2
And this one gives you shorter captions.
https://huggingface.co/spaces/gokaygokay/SDXL-Captioner
This one gives you longer captions.
gokaygokay/SD3-Long-Captioner
This one gives you middle size captions.
https://huggingface.co/spaces/gokaygokay/SD3-Long-Captioner-V2
And this one gives you shorter captions.
https://huggingface.co/spaces/gokaygokay/SDXL-Captioner

replied to
their
post
11 months ago
Do you mean the consistency between generations? Could you elaborate a little?

replied to
their
post
11 months ago
That'd be really cool indeed!