34 3 13

Yatharth Gupta

Warlord-K

https://www.linkedin.com/in/yatharth-g

AI & ML interests

Generative AI, Natural Language Processing, Reinforcement Learning

Organizations

replied to gokaygokay's post about 1 year ago

This is great! Florence 2 is an amazing model for its size but it hallucinates a lot in terms of positioning and color, that's reduced with this model but I think it still persists

replied to gokaygokay's post about 1 year ago

What are your thoughts on Florence2? Do you think finetuning it on these datasets will help on the captioning task?

reacted to gokaygokay's post with ❤️ about 1 year ago

Post

6004

I've fine-tuned three types of PaliGemma image captioner models for generating prompts for Text2Image models. They generate captions similar to prompts we give to the image generation models. I used google/docci and google/imageinwords datasets for fine-tuning.

This one gives you longer captions.

gokaygokay/SD3-Long-Captioner

This one gives you middle size captions.

https://huggingface.co/spaces/gokaygokay/SD3-Long-Captioner-V2

And this one gives you shorter captions.

https://huggingface.co/spaces/gokaygokay/SDXL-Captioner

10 replies

replied to their post over 1 year ago

Do you mean the consistency between generations? Could you elaborate a little?

replied to their post over 1 year ago

That'd be really cool indeed!

posted an update over 1 year ago

Post

1581

What are some areas that Image generation models are currently lacking in?

5 replies

Yatharth Gupta

AI & ML interests

Organizations

Warlord-K's activity