Optimizing Model Inference Speed for Large HTML Inputs
#12
by
chiralzhan
- opened
Hello,
I'm currently utilizing the H20 , ReaderLM model with the max_new_tokens parameter set to 4096, adhering to the official guidelines. My typical input consists of HTML documents ranging from 20 to 40 kb. The inference process generally takes about 4 to 5 minutes. Is this duration considered normal? If so, could you recommend strategies to enhance the inference speed?
Thank you for your assistance.
I believe you can do some simple cleaning of HTML input. Basically, it works.