possible to extend context to 1m tokens ?

by saireddy - opened 5 days ago

5 days ago

can we use dual chunk flash attention and increase the context window to 1M for this FP-8 model as well ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment