Optimizing Model Inference Speed for Large HTML Inputs

#12

by chiralzhan - opened 6 days ago

Discussion

chiralzhan

6 days ago

•

edited 6 days ago

Hello,

I'm currently utilizing the H20 , ReaderLM model with the max_new_tokens parameter set to 4096, adhering to the official guidelines. My typical input consists of HTML documents ranging from 20 to 40 kb. The inference process generally takes about 4 to 5 minutes. Is this duration considered normal? If so, could you recommend strategies to enhance the inference speed?

Thank you for your assistance.

numb3r3

Jina AI org about 17 hours ago

I believe you can do some simple cleaning of HTML input. Basically, it works.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment