OOM on 2xH100

by Maverick17 - opened Apr 7

Apr 7

I am trying to load this model using unsloth like so:

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "../Llama-4-Scout-17B-16E-Instruct-unsloth-dynamic-bnb-4bit",
)
FastLanguageModel.for_inference(model)

and I'm still OOM:

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 79.10 GiB of which 1.24 GiB is free. Including non-PyTorch memory, this process has 77.85 GiB memory in use. Of the allocated memory 77.33 GiB is allocated by PyTorch, and 10.03 MiB is reserved by PyTorch but unallocated.

fsaudm

Apr 7

Not 100% sure about Unsloth's syntax, but this is should not be a Language Model, it is an image-text to text model. Perhaps the correct functions would be Fast model instead?

That said, this might not even be relevant to your question, sorry I can't help with the OOM issue..

philbanjo

Apr 7

im getting the same errors - and I also have 2 h100s. shouldnt this be running on one h100 ??

fsaudm

Apr 7

Yeah with 4bit quantization it should definitely fit in one H100 80 GiB. Size for 109B params would roughly be 109*2=218 GiB in 16 bit, but ¼ of that if in 4bit, so 55 GiB.

philbanjo

Apr 7

right, can the people at unsloth give there script? my model is hitting way over 80 gb.

shimmyshimmer

Unsloth AI org Apr 7

I am trying to load this model using unsloth like so:

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "../Llama-4-Scout-17B-16E-Instruct-unsloth-dynamic-bnb-4bit",
)
FastLanguageModel.for_inference(model)

and I'm still OOM:

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 79.10 GiB of which 1.24 GiB is free. Including non-PyTorch memory, this process has 77.85 GiB memory in use. Of the allocated memory 77.33 GiB is allocated by PyTorch, and 10.03 MiB is reserved by PyTorch but unallocated.

Not 100% sure about Unsloth's syntax, but this is should not be a Language Model, it is an image-text to text model. Perhaps the correct functions would be Fast model instead?

That said, this might not even be relevant to your question, sorry I can't help with the OOM issue..

im getting the same errors - and I also have 2 h100s. shouldnt this be running on one h100 ??

wait for our official announcement! Should be tomorrow - PR is in progress

Maverick17

Apr 8

Hi @shimmyshimmer ,

where is this PR?

philbanjo

Apr 8

@shimmyshimmer can you message in here when the PR is announced?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment