I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions. The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).