Tips for building and also running gradio demo
#4
by
tol93
- opened
To speed up building flash_attn:
env NVCC_THREADS=16 pip install -U flash-attn --no-build-isolation
For running gradio_demo.py:
pip install openai # if ENABLE_REFINE = True, which it is by default
pip install gradio huggingface_hub sentencepiece
Other notes:
- gradio demo doesn't run with 8GB VRAM. It appears 24GB is needed.
- You must accept Meta EULA and be granted access to the model, which took a minute or maybe several minutes for me.
- Building flash_attn takes a VERY long time (hours). See the first tip above above to speed it up . I only saw four jobs running in parallel with NVCC_THREADS=16, but the default was 2.