image processing is different from the github version

#18

by enhaoz - opened Feb 6

Discussion

enhaoz

Feb 6

•

edited Feb 7

Hi team,
Thanks for the great work! While I am experimenting with llava-hf/llava-1.5-7b-hf and the github version (liuhaotian/llava-v1.5-7b), I realize the image processing stage is different, thus giving different generation results.
With llava-hf/llava-1.5-7b-hf, images seem to be cropped to square size without padding.

While in the github repo, images are cropped to square size after padding, due to the field {"image_aspect_ratio": "pad"} in model.config.

Am I missing something?

ArthurZ

Llava Hugging Face org Feb 8

cc @nielsr pretty sure we can control that with the image processor no?

JackBAI

Aug 27

Same question, I think there's a difference in image processor.

Libertaz

Oct 4

image_processor is more like the original llava solution, llavanextimageprocessor uses pad

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment