81 5 173

Csaba Kecskemeti PRO

csabakecskemeti

https://devquasar.com/

csabakecskemeti

AI & ML interests

None yet

Recent Activity

updated a model about 19 hours ago

DevQuasar/deepseek-ai.DeepSeek-V3.1-GGUF

updated a model about 19 hours ago

DevQuasar/deepseek-ai.DeepSeek-V3.1-GGUF

updated a model about 19 hours ago

DevQuasar/deepseek-ai.DeepSeek-V3.1-GGUF

View all activity

Organizations

posted an update 2 months ago

Post

2891

Has anyone ever backed up a model to a sequential tape drive, or I'm the world first? :D
Just played around with my retro PC that has got a tape drive—did it just because I can.

5 replies

posted an update 3 months ago

Post

399

Deepseek R1 0528 Q2 locally.
(I believe it has overthinking it a bit :) )
https://youtu.be/Iqu5s9aFaXA?si=QWZe293iTKf_3ELU

DevQuasar/deepseek-ai.DeepSeek-R1-0528-GGUF

posted an update 5 months ago

Post

2100

Local Llama4 Maverick Q2
https://youtu.be/4F8g_LThli0?si=MGba2SUTHt6xYw3T
Quants uploading now

Big thanks to @ngxson !

posted an update 5 months ago

Post

1737

Why the 'how many r's in strawberry' prompt "breaks" llama4? :D

Quants DevQuasar/meta-llama.Llama-4-Scout-17B-16E-Instruct-GGUF

3 replies

posted an update 5 months ago

Post

3393

I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).

https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page

I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.

posted an update 5 months ago

Post

2403

Managed to get my hands on a 5090FE, it's beefy

| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ± 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ± 0.18 |

Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/

replied to their post 5 months ago

Follow-up

With the smaller context length dataset the training has succeeded.

posted an update 5 months ago

Post

1838

GTC new model announcement now from Nvidia
nvidia/Llama-3_3-Nemotron-Super-49B-v1

GGUFs:
DevQuasar/nvidia.Llama-3_3-Nemotron-Super-49B-v1-GGUF

Enjoy!

reacted to clem's post with 🚀 5 months ago

Post

4738

We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!

3 replies

posted an update 5 months ago

Post

589

Cohere Command-a Q2 quant
DevQuasar/CohereForAI.c4ai-command-a-03-2025-GGUF

6.7t/s on a 3gpu setup (4080 + 2x3090)

(q3, q4 currently uploading)

replied to their post 6 months ago

No success so far, the training data contains some larger contexts and it fails just before complete the first epoch.
(dataset: DevQuasar/brainstorm-v3.1_vicnua_1k)

If anyone has further suggestion to the bnb config (with ROCm on MI100)?
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)

Now testing with my other dataset that is smaller seems I have a lower memory need
DevQuasar/brainstorm_vicuna_1k

replied to their post 6 months ago

It's failed by the morning, need to find more room to decrease the memory

replied to their post 6 months ago

The machine itself is also funny. This my my GPU test bench.
Now also testing the PWM fan control and jetkvm

posted an update 6 months ago

Post

839

Fine tuning on the edge. Pushing the MI100 to it's limits.
QWQ-32B 4bit QLORA fine tuning
VRAM usage 31.498G/31.984G :D

4 replies

replied to their post 6 months ago

QLORA model loaded in 4bits

replied to their post 6 months ago

Updated the post with GGUF (Q4,Q8) performance metrics

replied to their post 6 months ago

Good callout will add this evening
Llama 3 8b q8 was around 80t/s generation

posted an update 6 months ago

Post

1991

-UPDATED-
4bit inference is working! The blogpost is updated with code snippet and requirements.txt
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
-UPDATED-
I've played around with an MI100 and ROCm and collected my experience in a blogpost:
https://devquasar.com/uncategorized/all-about-amd-and-rocm/
Unfortunately I've could not make inference or training work with model loaded in 8bit or use BnB, but did everything else and documented my findings.

4 replies

replied to their post 6 months ago

@sometimesanotion you might have more experience with AMD than me :)

replied to their post 6 months ago

So far I'm managed to have a working bnb up:

(bnbtest) kecso@gpu-testbench2:~/bitsandbytes/examples$ python -m bitsandbytes
g++ (Ubuntu 14.2.0-4ubuntu2) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='63', rocm_version_tuple=(6, 3)
PyTorch settings found: ROCM_VERSION=63
The directory listed in your path is found to be non-existent: local/gpu-testbench2
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/2803,unix/gpu-testbench2
The directory listed in your path is found to be non-existent: /etc/xdg/xdg-ubuntu
The directory listed in your path is found to be non-existent: /org/gnome/Terminal/screen/6bd83ab2_fd9f_4990_876a_527ef8117ef6
The directory listed in your path is found to be non-existent: //debuginfod.ubuntu.com
WARNING! ROCm runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
SUCCESS!
Installation was successful!

It's able to load the model to vram, but inference fails:
Exception: cublasLt ran into an error!

This is the main problem with anything not NVIDIA. The software is painful!
Keep trying...

Csaba Kecskemeti PRO

AI & ML interests

Recent Activity

Organizations

csabakecskemeti's activity