This PR addresses a bug that would prevent flash attention 2 from running with granite-speech-8b using HF transformers. The same bug was not present for the 2b version.

Upon closer inspection the line " "_attn_implementation_autoset": true "was not present in config.json (but was present in the 2b version). After adding this line FA2 is functional again

IBM Granite org

Done in the latest revision, thank you!

gsaon changed pull request status to closed

Sign up or log in to comment