(I believe it has overthinking it a bit :) )
https://youtu.be/Iqu5s9aFaXA?si=QWZe293iTKf_3ELU
DevQuasar/deepseek-ai.DeepSeek-R1-0528-GGUF
Follow-up
With the smaller context length dataset the training has succeeded.
No success so far, the training data contains some larger contexts and it fails just before complete the first epoch.
(dataset: DevQuasar/brainstorm-v3.1_vicnua_1k)
If anyone has further suggestion to the bnb config (with ROCm on MI100)?
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
Now testing with my other dataset that is smaller seems I have a lower memory need
DevQuasar/brainstorm_vicuna_1k
It's failed by the morning, need to find more room to decrease the memory
Updated the post with GGUF (Q4,Q8) performance metrics
Good callout will add this evening
Llama 3 8b q8 was around 80t/s generation
@sometimesanotion you might have more experience with AMD than me :)
So far I'm managed to have a working bnb up:
(bnbtest) kecso@gpu-testbench2:~/bitsandbytes/examples$ python -m bitsandbytes
g++ (Ubuntu 14.2.0-4ubuntu2) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
ROCm specs: rocm_version_string='63', rocm_version_tuple=(6, 3)
PyTorch settings found: ROCM_VERSION=63
The directory listed in your path is found to be non-existent: local/gpu-testbench2
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/2803,unix/gpu-testbench2
The directory listed in your path is found to be non-existent: /etc/xdg/xdg-ubuntu
The directory listed in your path is found to be non-existent: /org/gnome/Terminal/screen/6bd83ab2_fd9f_4990_876a_527ef8117ef6
The directory listed in your path is found to be non-existent: //debuginfod.ubuntu.com
WARNING! ROCm runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and ROCm is callable...
SUCCESS!
Installation was successful!
It's able to load the model to vram, but inference fails:
Exception: cublasLt ran into an error!
This is the main problem with anything not NVIDIA. The software is painful!
Keep trying...