How much GPU memory is required for 32k context embedding?
#32
by
Labmem009
- opened
I tried to use this model to get embedding of long text, but I failed many times with 6*A100 and DP for OOM. Is there any suggestion to allocate memory for long text?
For 32k context, it needs to run on an 80GB A100 GPU with float16 / bfloat16
and FlashAttention
enabled, also the batch size needs to be reduced to 1
.