FP4 quantization, INT4 crash on RTX 50XX

#2
by Askd234 - opened

How long does it take to train a Chroma model, and what did you use Runpod, and which GPU?

Owner
β€’
edited Jun 7

FP4 quantization, INT4 crash on RTX 50XX

Yeah I think you need an FP4 quant for blackwell GPUs. I will probably create a new quant around v36 or so, and I'll try a FP4 quant (in addition to the INT4) when I do that.

How long does it take to train a Chroma model, and what did you use Runpod, and which GPU?

I assume you mean create the quantized version - I think about 12 hours on a H100. Here's specifically what I used:

  • Container Image: pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel
  • Container start command: bash -c 'apt update;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo "$PUBLIC_KEY" >> authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity'
  • Container Disk: 2900GB (Using container disk because some volumes seem to be network disks which are very slow. Large disk is needed for caching activations during optimization. You may need to increase this if you adjust the params in the script.)
  • Volume Disk: 1GB (this is unused, mount path doesn't matter)
  • Once it has started, SSH into it using the command you get from runpods "connect" button, and then run this script: https://gist.github.com/josephrocca/dd339fc20c18dd48524c57b6e4005486

It would be great if there was an fp4-q

I will hopefully do fp4 in the next few days. I did try it when I did int4 v38, but I got an OOM for some reason, and didn't look too far into it. Probably a simple fix.

I will be waiting patiently :pray:

Sign up or log in to comment