why the model inference so slowly??
#29
by
LuYinMiao
- opened
I spent more than 30 mins to solve only 5 question. the config is:
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B",torch_dtype="auto").to('cuda')
对问题进行tokenization
inputs = tokenizer(text, return_tensors="pt").to(model.device)
# 使用模型生成答案
outputs = model.generate(**inputs, max_length=1024)
# 解码生成的答案
outputs= tokenizer.decode(outputs[0], skip_special_tokens=True).
Please give some help.