Running on CPU Upgrade 13.8k Open LLM Leaderboard ๐ 13.8k Track, rank and evaluate open LLMs and chatbots
view reply We've not measured FLOPs, but we have a few plots here that measure total time for generation here: https://github.com/NVIDIA/kvpress/blob/main/notebooks/speed_and_memory.ipynb For most presses, the compression computation are very light compared to the forward pass of the long context itself.
view reply Happy you like it @julien-c ! Feel free to share it on social media to raise awareness around the package ๐ค