Optimizing Model Inference Speed for Large HTML Inputs

#12
by chiralzhan - opened

Hello,

I'm currently utilizing the H20 , ReaderLM model with the max_new_tokens parameter set to 4096, adhering to the official guidelines. My typical input consists of HTML documents ranging from 20 to 40 kb. The inference process generally takes about 4 to 5 minutes. Is this duration considered normal? If so, could you recommend strategies to enhance the inference speed?

Thank you for your assistance.

I believe you can do some simple cleaning of HTML input. Basically, it works.

Sign up or log in to comment