ValueError with multi A100 GPUS

by saireddy - opened

anyone facing this issue with A100 multi gpus
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
I am using "auto" for device map, still hitting this issue

Google org

Hi @saireddy , Could you please have a look at this similar issue, seems duplicate? Please let us know if the issue still persists. Thank you.

saireddy changed discussion status to closed

Sign up or log in to comment