Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training

#3
by mradermacher - opened

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.

I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.

474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.

I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).

So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

457.4g after warming up.

So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)

llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1 and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?

grafik.png

I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.

grafik.png

dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:

grafik.png

Yes it is clearly streaming from SSD now:

grafik.png

Once the quantisation tasks are interrupted it should work without SSD streaming again.

I don't think we ever rebooted rich1 after we had to reinstall it after the LXC corruption incident.

No, but rich1 and the vm rebooted multiple times before, and once after, and the only time that file was created was when I initially ran my script to configure wireguard and other stuff (i.e. twice only). I can only imagine some script went around and either deleted all 0-size files or any file starting with tmp.* - just very weird. But who knows, maybe whatever script that was run to essentially destroy rich1 also ran a find over the whole disk.

The only evidence is that something mucked with that directory on jan 28th, so it's unlikely to have been something that happened before. I was lucky that I made a copy of the queue just in case when it went down, otherwise restoring the jobs would be... difficult.

Thanks. Great to finally see rich1 working again.

Yeah, I was getting a bit desperate - nico1 much less than 50% usable for weeks, rich1 gone, and an unprecedented number of models, and big ones, too (I mean 70B..130B, not deepseek) made for very tense moments. All in all, it's making good progress despite all, and we even did make a tiny bit of progress on the nice 1000+ models.

Why does it say:

0   66 si Virtuoso-Medium-v2                           error/255 repo create

The repository clearly exists under https://huggingface.co/mradermacher/Virtuoso-Medium-v2-GGUF - it is supposed to do static quants to that repo as the status shows si.

Edit: Now that the imatirx is done it shows sI as status but is still stuck at error/255 repo create. Luckily it just skips this task and works on other tasks in the meantime.
Edit: Ah nice it either fixed itself or you manually fixed it. In any case the model is now getting quantized.

This night and also this morning hf had enourmous timeout problems. Everything was affected, including web page loading. It's not fully fixed yet, but much better. I need to manually retry when it fails at this step.

Ah, and yes, if "s" is in the flags, it will never try imatrix quanting first.

Oh, and btw., hetzner sometimes has good offers, which might or might not be something to consider for richard, if he actually pays €250/month. Can't see an obvious candidate, but didn't look long, and the offers change considerably over time, e.g.

https://www.hetzner.com/sb/#price_from=180&price_to=250&cpuType=AMD&search=threadripper

All of these are a bit faster than his box, and cheaper, afaics.

@mradermacher For a few days I can no longer pause GPUs using echo pause GPU-188a5143-db69-7058-63b5-f2f1d2354f91 >/dev/tcp/10.28.1.1/16713 or use the nico1-pause script. There is no error but nothing on the scheduler happens when I do so. I just ran the nico1-pause script and it correctly interrupted all the tasks, but the status page doesn't show that nico1 is paused and tasks still run. The same for GPU pausing. They don't appear on the status page and just get ignored but there I was able to at least still use the /tmp/pause flag but for nico1-pause there are no alternatives for me to use.

I'll look into it - I replaced the other side of that echo with something else so we can use it to queue and manage jobs, so obviously, it must be subtly broken.

both should work again - I also paused nico1 and removed the override files (which I hope was your intent).

Sign up or log in to comment