possible to extend context to 1m tokens ?

#5
by saireddy - opened

can we use dual chunk flash attention and increase the context window to 1M for this FP-8 model as well ?

Sign up or log in to comment