About training data?

#17
by animemory - opened

Hi, thanks for your work! How many training data did u use for qwen-image lightning training?

Hi, we use about 420 k items

Hi, we use about 420 k items

hi, just prompts data? How about the qwen edit model?

Hi, we use about 420 k items

hi, just prompts data? How about the qwen edit model?

Yes. For the qwen edit model, we generate 120k prompt image pairs.

Hi, we use about 420 k items

hi, just prompts data? How about the qwen edit model?

Yes. For the qwen edit model, we generate 120k prompt image pairs.

Got it! Thanks a lot for your tips. B.T.W. Is there any collection tips for the prompt data, e.g. the class, subject, characters e.t.c

@animemory
We divide the images into several categories, e.g., human, scene, pure-text, scene with text. For each category, we have some example prompts for MLM, and let MLM understand the image and decide whether the example prompt is applicable. If applicable, let MLM to rewrite the example prompt so that it's better aligned with the image, if not applicable, let MLM choose another example prompt.

For example, the input example prompt is like "remove some object in the image" and we give it a image like "a girl wearing t-shirt is drinking, sitting in front of a table". Then the MLM will output prompt like "remove the table in this figure"

Below is some prompts collection used in distillation. We find MLM has done a good work and each prompt is related with the corresponding reference image.

    "text_annot": "Add a French flag to the world map near Europe."
    "text_annot": "Replace the background with a serene sunset scene featuring warm orange and pink hues, with silhouettes of people strolling along the pier."
    "text_annot": "Add a small pond with lily pads in the background to create a more natural habitat for the snail."
    "text_annot": "Add a small, colorful flower next to the hand to enhance the visual interest."
    "text_annot": "Replace the cityscape with a serene countryside scene featuring rolling green hills, a quaint village with thatched-roof houses, and a clear blue sky."
    "text_annot": "Enhance the evening atmosphere by adding a warm glow to the buildings and sky."
    "text_annot": "Add a person wearing a high-visibility vest operating the machine near the top left corner for scale and context."
    "text_annot": "Add a small, colorful flower to the background near the hand."
    "text_annot": "Remove the graffiti from the red train car."
    "text_annot": "Remove the bottle from the image."
    "text_annot": "Keep the front view of the image."
    "text_annot": "Keep the front view of the image."
    "text_annot": "Replace the man's striped sweater with a solid olive green sweater."
    "text_annot": "Replace the woman's black sports bra with a white crop top and add a pair of black sunglasses on her head."
    "text_annot": "Replace the man's white shirt with a navy blue sweater vest."
    "text_annot": "Add a pair of black-rimmed glasses to the person's face and change the background to a cyberpunk style with neon lights and futuristic architecture."
    "text_annot": "Add a pair of sunglasses to the woman's face, positioned on her nose."
    "text_annot": "Add a leopard-print scarf around the woman's neck, positioning it slightly tilted to the right."
    "text_annot": "Replace the leopard print scarf with a solid black one."
    "text_annot": "Replace the doctor's white coat with a blue lab coat."

and some text_editting prompts

    "text_annot": "append 'SIMON' at lower-center"
    "text_annot": "withdraw '很有意' from the text"
    "text_annot": "extend 'gj.0048 一直远离陆地,所以许多 大部分简直可以' at center"
    "text_annot": "enlarge 'restaurant' at lower-center"
    "text_annot": "convert 'Supplies' with 'Ban'"
    "text_annot": "排除 '染病没有能再' from the text"

@animemory
We divide the images into several categories, e.g., human, scene, pure-text, scene with text. For each category, we have some example prompts for MLM, and let MLM understand the image and decide whether the example prompt is applicable. If applicable, let MLM to rewrite the example prompt so that it's better aligned with the image, if not applicable, let MLM choose another example prompt.

For example, the input example prompt is like "remove some object in the image" and we give it a image like "a girl wearing t-shirt is drinking, sitting in front of a table". Then the MLM will output prompt like "remove the table in this figure"

Below is some prompts collection used in distillation. We find MLM has done a good work and each prompt is related with the corresponding reference image.

    "text_annot": "Add a French flag to the world map near Europe."
    "text_annot": "Replace the background with a serene sunset scene featuring warm orange and pink hues, with silhouettes of people strolling along the pier."
    "text_annot": "Add a small pond with lily pads in the background to create a more natural habitat for the snail."
    "text_annot": "Add a small, colorful flower next to the hand to enhance the visual interest."
    "text_annot": "Replace the cityscape with a serene countryside scene featuring rolling green hills, a quaint village with thatched-roof houses, and a clear blue sky."
    "text_annot": "Enhance the evening atmosphere by adding a warm glow to the buildings and sky."
    "text_annot": "Add a person wearing a high-visibility vest operating the machine near the top left corner for scale and context."
    "text_annot": "Add a small, colorful flower to the background near the hand."
    "text_annot": "Remove the graffiti from the red train car."
    "text_annot": "Remove the bottle from the image."
    "text_annot": "Keep the front view of the image."
    "text_annot": "Keep the front view of the image."
    "text_annot": "Replace the man's striped sweater with a solid olive green sweater."
    "text_annot": "Replace the woman's black sports bra with a white crop top and add a pair of black sunglasses on her head."
    "text_annot": "Replace the man's white shirt with a navy blue sweater vest."
    "text_annot": "Add a pair of black-rimmed glasses to the person's face and change the background to a cyberpunk style with neon lights and futuristic architecture."
    "text_annot": "Add a pair of sunglasses to the woman's face, positioned on her nose."
    "text_annot": "Add a leopard-print scarf around the woman's neck, positioning it slightly tilted to the right."
    "text_annot": "Replace the leopard print scarf with a solid black one."
    "text_annot": "Replace the doctor's white coat with a blue lab coat."

and some text_editting prompts

    "text_annot": "append 'SIMON' at lower-center"
    "text_annot": "withdraw '很有意' from the text"
    "text_annot": "extend 'gj.0048 一直远离陆地,所以许多 大部分简直可以' at center"
    "text_annot": "enlarge 'restaurant' at lower-center"
    "text_annot": "convert 'Supplies' with 'Ban'"
    "text_annot": "排除 '染病没有能再' from the text"

Get it! Thanks a lot. Will u release the distillation dataset B.T.W 😆?

@animemory Hi, we are sorry that we are not allowed to release the dataset... 😆

@animemory Hi, we are sorry that we are not allowed to release the dataset... 😆
o(╥﹏╥)o got it!

Sign up or log in to comment