seems deepseek vl2 small and vl2 cannot cannot perform properly

#4
by AIBoosted - opened

same code:
vl_chat_processor = DeepseekVLV2Processor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
vl_gpt = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

conversation = [
{
"role": "<|User|>",
"content": f"\n{prompt}"
},
{"role": "<|Assistant|>", "content": ""}
]

pil_images = [image]  

prepare_inputs = vl_chat_processor(
    conversations=conversation,
    images=pil_images, 
    force_batchify=True,
    system_prompt=""
).to(vl_gpt.device)
print("prepare_inputs keys:", prepare_inputs.keys())

inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
print("inputs_embeds shape:", inputs_embeds.shape)
outputs = vl_gpt.language.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True
)

answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)

both deepseek vl2 small and deepseek vl2 cannot give me a proper answer, however this code works properly using tiny model.
is that because the example code have some problem? outdated?
Thanks if anyone see the problem

Sign up or log in to comment