Int4为什么比没量化的float32和float16还慢

#3
by hujianmin - opened

prompts =[
"A B C D E",
"one two three four five",
"When I was just a lad of ten, my father said to",
"Today I bought some",
"A majestic tiger walking through ",
"马龙是一名乒乓球",
"Imagine a breathtaking fantasy landscape during the golden hour, where the sun is setting behind a range of majestic snow-capped mountains. The sky is painted in vibrant hues of orange, pink, and purple, with scattered clouds reflecting the warm sunlight. In the foreground, a crystal-clear river winds through a lush valley, its surface shimmering with golden light. On the riverbank, a small medieval village with stone cottages and thatched roofs is nestled among blooming cherry blossom trees, their petals gently falling into the water. A cobblestone path leads from the village to a grand, ancient castle perched on a hill, surrounded by dense, enchanted forests with glowing mushrooms and ethereal blue fireflies.",
"中国的首都在",

]
对于这段prompts,new_token_length=32,int4耗时32s,float32和float16耗时为12s,设备时四块A100-40GB-PCIE

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment