Flash attention 2 error, could you kindly provide solution
model_name = "ai4bharat/indic-parler-tts"
model = ParlerTTSForConditionalGeneration.from_pretrained(
model_name,
attn_implementation="flash_attention_2" # <-- Enable Flash Attention 2
)
gave errors:
\default\Lib\site-packages\transformers\modeling_utils.py", line 1617, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "<..>\Lib\site-packages\transformers\modeling_utils.py", line 1736, in _check_and_enable_flash_attn_2
raise ValueError(
ValueError: T5EncoderModel does not support Flash Attention 2.0 yet.
windows 11 with torch2.6.0, transformers and other dependencies already installed with cuda 12.6 present, tested GPU to work with transformers (tested a different script)
Unfortunately without flash attention 2 (which is expected to bring orders of magnitude speedup), it is taking quite long on Rtx3080 GPU for even 3 lines of telugu text,
Need to use flash attention 2 to get the maximum speedup to be able to process text file containing 100s of lines, so resolving this flash attention 2 error would be very helpful
Thank you in advance