Hi everyone,
I'm encountering a persistent OSError when trying to load the meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 model. The error message indicates that specific .safetensors files are not found, even though they appear to be listed in the model.safetensors.index.json and are visible in the "Files and versions" tab on the Hub.
Error Message:
OSError: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 does not appear to have files named ('model-00059-of-00084.safetensors', 'model-00060-of-00084.safetensors', ... , 'model-00084-of-00084.safetensors'). Checkout 'https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8/tree/main'for available files.
Use code with caution.
(The list of missing files sometimes starts корм a different file number, e.g., model-00054-of-00084.safetensors, but the nature of the error is the same).
Environment:
Platform: RunPod
GPU: 2 x A100
Python Version: 3.10
transformers Version: 4.51.3
accelerate Version: 1.7.0
CUDA Version (implicitly via PyTorch): cu118 (PyTorch 2.1.0+cu118)
Code to Reproduce (simplified):
from transformers import AutoTokenizer, AutoModelForCausalLM
import os

model_name = "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"

Ensured HF_TOKEN is set correctly from environment variable (HF_KEY on RunPod)

Example:

hf_api_key = os.getenv("HF_KEY")

if hf_api_key:

os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_api_key

os.environ["HF_TOKEN"] = hf_api_key

try:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
print("Tokenizer loaded successfully.")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True
)
print("Model loaded successfully.")

except Exception as e:
print(f"Error loading model: {e}")
import traceback
traceback.print_exc()
Use code with caution.
Python
Troubleshooting Steps Taken:
Verified Hugging Face Token: Ensured the token is correct, has read access, and the account is approved for Meta Llama models. Set the token via huggingface-cli login and also explicitly via environment variables in the Python script (HUGGING_FACE_HUB_TOKEN, HF_TOKEN from HF_KEY). huggingface-cli whoami confirms login.
Cleared Hugging Face Cache: Deleted ~/.cache/huggingface/hub પાણી rm -rf and restarted the kernel multiple times.
Checked model.safetensors.index.json: Confirmed that the "missing" files listed in the error message ARE indeed referenced in the model.safetensors.index.json file on the Hub.
Attempted Library Updates: Tried to upgrade transformers and accelerate using pip install --upgrade, but the versions remained transformers==4.51.3 and accelerate==1.7.0 (seems to be the latest available in the RunPod environment). Also upgraded pip itself.
trust_remote_code=True is used.
Sufficient VRAM: With 2xA100, VRAM should not be an issue for loading, especially an FP8 model. The error occurs before significant VRAM allocation, during file resolution.
Despite these steps, the OSError persists, suggesting the transformers library cannot locate or access these specific sharded model files during the download/resolution process from the Hub, even though they are listed in the index.
Has anyone else experienced a similar issue with this model or other sharded models, particularly in a cloud environment like RunPod? Any insights or suggestions on what might be causing this or how to further debug would be greatly appreciated.
Could this be an issue with the model repository itself, or perhaps a very specific incompatibility with the library versions available in my current environment?
Thank you for your help!

meta-llama
/

Llama-4-Maverick-17B-128E-Instruct-FP8

Error loading meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8: Files not found (OSError)