DevQuasar

community
Verified
Activity Feed

AI & ML interests

Open-Source LLMs, Local AI Projects: https://pypi.org/project/llm-predictive-router/

Recent Activity

DevQuasar's activity

csabakecskemetiย 
posted an update 9 days ago
view post
Post
2739
Has anyone ever backed up a model to a sequential tape drive, or I'm the world first? :D
Just played around with my retro PC that has got a tape driveโ€”did it just because I can.
ยท
csabakecskemetiย 
posted an update 22 days ago
csabakecskemetiย 
posted an update 2 months ago
csabakecskemetiย 
posted an update 2 months ago
csabakecskemetiย 
posted an update 3 months ago
view post
Post
3375
I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).

https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page

I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.
csabakecskemetiย 
posted an update 3 months ago
view post
Post
2394
Managed to get my hands on a 5090FE, it's beefy

| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ยฑ 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ยฑ 0.18 |

Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/
csabakecskemetiย 
posted an update 3 months ago
csabakecskemetiย 
posted an update 3 months ago
csabakecskemetiย 
posted an update 3 months ago
view post
Post
834
Fine tuning on the edge. Pushing the MI100 to it's limits.
QWQ-32B 4bit QLORA fine tuning
VRAM usage 31.498G/31.984G :D

  • 4 replies
ยท