Rope scaling implementation

#11
by cvdbdo - opened

Do you plan on implementing rope scaling in the near future?

(In transformers such as

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    rope_scaling={"type": "dynamic", "factor": 2.},
    device_map='auto'
)

)

but don't they use sliding window attention mechanism for larger contexts?

That's not planned in the near future!

lerela changed discussion status to closed

Sign up or log in to comment