Rope scaling implementation

#11

by cvdbdo - opened Sep 28, 2023

Discussion

cvdbdo

Sep 28, 2023

Do you plan on implementing rope scaling in the near future?

(In transformers such as

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    rope_scaling={"type": "dynamic", "factor": 2.},
    device_map='auto'
)

)

GaussianMixture

Sep 30, 2023

but don't they use sliding window attention mechanism for larger contexts?

lerela

Oct 5, 2023

That's not planned in the near future!

lerela changed discussion status to closed Oct 5, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment