can we use dual chunk flash attention and increase the context window to 1M for this FP-8 model as well ?
· Sign up or log in to comment