Interpretation Text/Sec , Speed Decreases with Text Length and Speed impact of Increase of number of labels (multi lable)
Update: I did a benchmark on the 2080 Ti, results can be found below
Hi Moritz,
in the description you mention the speed of the model on A100 in text/sec. This lead me to a number of questions, that are very relevant for my social research.
What is the correct interpretation? Is the inference speed just TOKENS classified per second or complete texts?
Do I correctly assume that the mean text length of XNLI is around 35 tokens?
Does doubling the text length to 70 tokens increase the inference time linearly or should I expect a the quadratic growth of self attention to kick in?
I will use a Multi-Label approach: How does the number of labels increase the inference time? As I managed to understand, the classification is done by some few extra layers. Hence, I wouldnt expect the inference to grow linearly with the number of categories. What difference would you expect between 3 and 8 classes? (Multi Label, not Multi-Class)
Is a performance estimation of 1/3 of an A100 on a 2080 Ti (with this model, 30-80 token length) realistic?
System: 2080 Ti with 11gb VRAM
Based on the answers, I have to decide if I design my coding scheme for very short text or longer context units. And if I train the model on more granular sub-categories (simpler) or few more complex categories.
Thank you for your expertise and great research! The implications of your NLI framework for Social Science Research REALLY excited me.
Im going back to work on my coding scheme now ;)
Update: I am still confused about 1400 texts per Second, but I benchmarked the 2080 Ti I can use (I only asked because I expected to run into a huge problems accessing my unis servers -- I didnt)
Test Settings:
- System: NVIDIA 2080 Ti 11 GB VRAM, 30GB of RAM, AMD(R) EPYC(R) 7551P, with 12 threads assigned
- Sample size: 1000 texts in each run (each gets processed again for every hypotheses according to Moritz)
- Text length: 40, 80, 160 Tokens (roughly) + length of hypotheses
- short: 3 categories
- long: 7 categories
- very long: 12 categories
✨Conclusion:
- rule of thumb within the parameters : ==> the 2080 Ti classifes around 9-12k tokens per second ✅
- Unexpectedly, within the sequence lengths, the impact of the text length is linear 📈
- Impact of Number of Categories is also linear 📈
- seems like a large Batch Size sometimes decreases performance (or its something else, I cant controll the server that much, dont understand why) ❓
✨For relation between sequence length and inference time, also see this benchmark someone else did
https://github.com/renebidart/text-classification-benchmark
Batch size 32
--- New Run ---40_long
Start Time: 704913.770152 seconds
End Time: 704942.246701 seconds
Total Time: 28.476549 seconds
--- New Run ---80_short
Start Time: 704944.325017 seconds
End Time: 704967.258317 seconds
Total Time: 22.933300 seconds
--- New Run ---80_long
Start Time: 704969.272610 seconds
End Time: 705037.520076 seconds
Total Time: 68.247466 seconds
--- New Run ---160_long
Start Time: 705039.613860 seconds
End Time: 705174.077503 seconds
Total Time: 134.463643 seconds
Batch size 64
--- New Run ---40_short
Start Time: 706145.652731 seconds
End Time: 706157.671451 seconds
Total Time: 12.018720 seconds
--- New Run ---40_long
Start Time: 706159.620019 seconds
End Time: 706188.388336 seconds
Total Time: 28.768317 seconds
--- New Run ---80_short
Start Time: 706190.357949 seconds
End Time: 706214.906927 seconds
Total Time: 24.548978 seconds
--- New Run ---80_long
Start Time: 706216.906504 seconds
End Time: 706286.605284 seconds
Total Time: 69.698780 seconds
--- New Run ---160_short
Start Time: 706288.630773 seconds
End Time: 706344.379014 seconds
Total Time: 55.748241 seconds
--- New Run ---160_long
Start Time: 706346.357994 seconds
End Time: 706484.368974 seconds
Total Time: 138.010980 seconds
BATCH SIZE 32, 12 Cats
--- New Run ---40verylong
Start Time: 762145.823620 seconds
End Time: 762193.011388 seconds
Total Time: 47.187768 seconds
--- New Run ---80verylong
Start Time: 762194.980967 seconds
End Time: 762266.899034 seconds
Total Time: 71.918067 seconds
--- New Run ---160verylong
Start Time: 762268.973669 seconds
End Time: 762404.803304 seconds
Total Time: 135.829635 seconds
Hey @thedamsch , quick answers to your questions:
What is the correct interpretation? Is the inference speed just TOKENS classified per second or complete texts?
=> it's complete texts per second
Do I correctly assume that the mean text length of XNLI is around 35 tokens?
=> possible, the texts for these standard NLI datasets is always quite short (note that it needs to process the hypothesis + the text itself simultaneously)
Does doubling the text length to 70 tokens increase the inference time linearly or should I expect a the quadratic growth of self attention to kick in?
=> it's a transformer with somewhat standard attention, so quadratic growth
I will use a Multi-Label approach: How does the number of labels increase the inference time? As I managed to understand, the classification is done by some few extra layers. Hence, I wouldnt expect the inference to grow linearly with the number of categories. What difference would you expect between 3 and 8 classes? (Multi Label, not Multi-Class)
=> each additional label means one more forward-pass through the model, so inference requirements increase linearly with additional classes. So classifying 10 classes theoretically takes 5x longer than classifying 2 classes. (for inference time, it doesn't matter whether you use multi-label or multi-class)
Is a performance estimation of 1/3 of an A100 on a 2080 Ti (with this model, 30-80 token length) realistic?
=> I'm not sure tbh