ValueError: `rope_scaling` must be a dictionary with two fields

#15
by jsemrau - opened

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

Using the standard script on Huggingface, I get this error message. What needs to be done here?

I am running into this same issue.

Solution : pip install --upgrade transformers

Works! I prev had transformers==4.38.2 but upgrading resolved the rope_scaling error but upgrading resulted in 'top_k_top_p_filtering' ImportError. For those encountering this error: the solution to this second error is pip install --upgrade trl.

In order to run LLaMA 3.1 in the same environment as LLaMA 3 deployments, some additional package upgrades might be necessary. I’ve also had to upgrade VLLM, my backend, to use LLaMA 3.1 as it was throwing rope scaling related errors as well. If you encounter issues similar to the one described above, continue upgrading packages that produce errors, and hopefully, the issue will be resolved.

I am having the same issue. When attempting to load the model with textgenwebui I get the same kind of error and I have updated all requirements/dependencies including transformers.

Perhaps textgenwebui hasn't been updated. Try filing an issue with textgenwebui.

What specific VLLM and transformers version works for LLaMa 3.1?

I have

transformers==4.43.1
vllm==0.5.3.post1

I am still getting same error even after upgrading vllm and transformers version
ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

I am still getting same error even after upgrading vllm and transformers version
ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

If it helps here is pip freeze: https://gist.github.com/macsz/4735d3b5265040ffda1220f0b2480acc

I'm also getting this error while loading this llama 3.1 8b instruct:

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

Solution : pip install --upgrade transformers

Worked for me, ty!

i tried this, and also upgraded the libraries mentioned in this discussion.

I had the same problem, upgraded transformers and pip, it worked! Do not forget to restart kernel after upgrading packages.

I' ve had the same issue and
pip install --upgrade transformers
was enough and worked for me.

Please update both vllm and transformers

pip install --upgrade transformers
pip install --upgrade vllm

I' ve had the same issue and
pip install --upgrade transformers
was enough and worked for me.

Can you please share you requirements.txt?

Please update both vllm and transformers

pip install --upgrade transformers
pip install --upgrade vllm

still not working for me.

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

What was before editing this?

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

The out of memory error is a whole different set of problems. You don't have enough VRAM to run the model. Do you use quantization when running the model?

Everyone that is using Huggingface Estimator in sagemaker 2.226.0(latest) will have to wait because currently it supports only transformers 4.36.0 image

"I have transformers==4.43.1"

This works. It should be version 4.43.+
https://github.com/huggingface/transformers/releases

pip install transformers==4.43.1

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

LLaMA 3.1 has larger context window of 128K vs 8k in LLaMA 3. Try reducing it and see if it works. In VLLM you can set it with --max-model-len param.

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

LLaMA 3.1 has larger context window of 128K vs 8k in LLaMA 3. Try reducing it and see if it works. In VLLM you can set it with --max-model-len param.

It worked when i did "rope_scaling" : null.
I am not sure how this can affect the inference results.
But this is working now.

None of the above solutions have worked, I have transformers and vllm upgraded completely, and I've tried editing the rope_scaling parameter itself but I keep running into OOM errors when tho I'm running on an A100 80GB. Anyone have any more solutions? I'm not against outside the box thinking at this point

Remove ${HF_HOME} and re-run? Maybe some garbage got cached.

Remove ${HF_HOME} and re-run? Maybe some garbage got cached.

no dice :/

None of the above solutions have worked, I have transformers and vllm upgraded completely, and I've tried editing the rope_scaling parameter itself but I keep running into OOM errors when tho I'm running on an A100 80GB. Anyone have any more solutions? I'm not against outside the box thinking at this point

same problem

Meta Llama org

As @jsemrau mentioned, please make sure that you are on transformers 4.43.2 (or higher) by running pip install --upgrade transformers. This should fix the original issue about rope_scaling. For other issues (like OOM problems), I would suggest to open new issues and provide system details.

Solution : pip install --upgrade transformers

Also works for me, cheers!

How does one do this via the docker installation of TGI? Do we need to build a separate dockerfile first with an upgrade transformers?

after changing to
"rope_scaling": { "factor": 8.0, "type": "dynamic" },
it procedes further
then
... Lib\site-packages\transformers\integrations\awq.py", line 354, in _fuse_awq_mlp new_module = target_cls(gate_proj, down_proj, up_proj, activation_fn) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "d:\code\autoawq\awq\modules\fused\mlp.py", line 41, in __init__ self.linear = awq_ext.gemm_forward_cuda ^^^^^^^ NameError: name 'awq_ext' is not defined

;D

Everyone that is using sagemaker 2.226.0(latest) will have to wait because currently it supports only transformers 4.36.0 image

Anyone knows where can I get a follow up on this info?

I encountered the same issue, and running the following command resolved it for me:

pip install --upgrade transformers==4.43.3

P.S. If you encounter the same issue repeatedly, check other libraries to see if they are installing different versions of transformers(ex: bitsandbytes). For the best results, after installing all the libraries, update transformers to version 4.43.3.

I am getting an error after fine-tuning the Llama 3.1 8B Instruct model and deploying it to SageMaker. I configured SageMaker to use HuggingFace Transformers 4.43, and the deployment was successful. However, when I try to test the endpoint, it gives this error. How can I run pip install --upgrade transformers==4.43.2?

Received client error (400) from 3VSBZEPFose1o1Q8vAytfGhMQD1cnCE5T83b with message "{ "code": 400, "type": "InternalServerException", "message": "rope_scalingmust be a dictionary with with two fields,typeandfactor, got {\u0027factor\u0027: 8.0, \u0027high_freq_factor\u0027: 4.0, \u0027low_freq_factor\u0027: 1.0, \u0027original_max_position_embeddings\u0027: 8192, \u0027rope_type\u0027: \u0027llama3\u0027}" } "

Edit: Disclaimer -> For Aws
Changing the image worked for me. Struggled with all other recommended images, as well as a custom image upgrading transformers. I can't yet explain it.

This one:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0

This error will be resolved if we do an update in the train.py function in axolotl.cli like the following for the rope_scaling: (I did this for my fine-tuning purpose and it worked)

Inject rope_scaling configuration if missing or incomplete

if not hasattr(cfg, 'rope_scaling') or 'type' not in cfg.rope_scaling or 'factor' not in cfg.rope_scaling:
    LOG.warning("`rope_scaling` not found or incomplete in config, applying defaults.")
    cfg.rope_scaling = {
        "type": "linear",  # You can set it to "dynamic" if that's preferred
        "factor": 8.0
    }

Sign up or log in to comment