OOM on 2xH100
I am trying to load this model using unsloth like so:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "../Llama-4-Scout-17B-16E-Instruct-unsloth-dynamic-bnb-4bit",
)
FastLanguageModel.for_inference(model)
and I'm still OOM:
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 79.10 GiB of which 1.24 GiB is free. Including non-PyTorch memory, this process has 77.85 GiB memory in use. Of the allocated memory 77.33 GiB is allocated by PyTorch, and 10.03 MiB is reserved by PyTorch but unallocated.
Not 100% sure about Unsloth's syntax, but this is should not be a Language Model, it is an image-text to text model. Perhaps the correct functions would be Fast model instead?
That said, this might not even be relevant to your question, sorry I can't help with the OOM issue..
im getting the same errors - and I also have 2 h100s. shouldnt this be running on one h100 ??
Yeah with 4bit quantization it should definitely fit in one H100 80 GiB. Size for 109B params would roughly be 109*2=218 GiB in 16 bit, but ¼ of that if in 4bit, so 55 GiB.
right, can the people at unsloth give there script? my model is hitting way over 80 gb.
I am trying to load this model using unsloth like so:
from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "../Llama-4-Scout-17B-16E-Instruct-unsloth-dynamic-bnb-4bit", ) FastLanguageModel.for_inference(model)
and I'm still OOM:
OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 79.10 GiB of which 1.24 GiB is free. Including non-PyTorch memory, this process has 77.85 GiB memory in use. Of the allocated memory 77.33 GiB is allocated by PyTorch, and 10.03 MiB is reserved by PyTorch but unallocated.
Not 100% sure about Unsloth's syntax, but this is should not be a Language Model, it is an image-text to text model. Perhaps the correct functions would be Fast model instead?
That said, this might not even be relevant to your question, sorry I can't help with the OOM issue..
im getting the same errors - and I also have 2 h100s. shouldnt this be running on one h100 ??
wait for our official announcement! Should be tomorrow - PR is in progress