Flash attention 2 error, could you kindly provide solution

#13

by k1-m - opened 5 days ago

Discussion

k1-m

5 days ago

•

edited 5 days ago

model_name = "ai4bharat/indic-parler-tts"
model = ParlerTTSForConditionalGeneration.from_pretrained(
    model_name,
    attn_implementation="flash_attention_2"  # <-- Enable Flash Attention 2
)

gave errors:

\default\Lib\site-packages\transformers\modeling_utils.py", line 1617, in _autoset_attn_implementation
cls._check_and_enable_flash_attn_2(
File "<..>\Lib\site-packages\transformers\modeling_utils.py", line 1736, in _check_and_enable_flash_attn_2
raise ValueError(
ValueError: T5EncoderModel does not support Flash Attention 2.0 yet.

k1-m

5 days ago

•

edited 5 days ago

windows 11 with torch2.6.0, transformers and other dependencies already installed with cuda 12.6 present, tested GPU to work with transformers (tested a different script)
Unfortunately without flash attention 2 (which is expected to bring orders of magnitude speedup), it is taking quite long on Rtx3080 GPU for even 3 lines of telugu text,

Need to use flash attention 2 to get the maximum speedup to be able to process text file containing 100s of lines, so resolving this flash attention 2 error would be very helpful
Thank you in advance

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment