Jul 23

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

Using the standard script on Huggingface, I get this error message. What needs to be done here?

elyhahami

Jul 23

I am running into this same issue.

jsemrau

Jul 23

Solution : pip install --upgrade transformers

elyhahami

Jul 23

Works! I prev had transformers==4.38.2 but upgrading resolved the rope_scaling error but upgrading resulted in 'top_k_top_p_filtering' ImportError. For those encountering this error: the solution to this second error is pip install --upgrade trl.

macsz

Jul 23

•

edited Jul 23

In order to run LLaMA 3.1 in the same environment as LLaMA 3 deployments, some additional package upgrades might be necessary. I’ve also had to upgrade VLLM, my backend, to use LLaMA 3.1 as it was throwing rope scaling related errors as well. If you encounter issues similar to the one described above, continue upgrading packages that produce errors, and hopefully, the issue will be resolved.

Kenji776

Jul 23

I am having the same issue. When attempting to load the model with textgenwebui I get the same kind of error and I have updated all requirements/dependencies including transformers.

macsz

Jul 24

Perhaps textgenwebui hasn't been updated. Try filing an issue with textgenwebui.

agrawalchitranshu

Jul 24

What specific VLLM and transformers version works for LLaMa 3.1?

macsz

Jul 24

I have

transformers==4.43.1
vllm==0.5.3.post1

agrawalchitranshu

Jul 24

•

edited Jul 24

I am still getting same error even after upgrading vllm and transformers version
ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

Kenji776

Jul 24

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

macsz

Jul 24

I am still getting same error even after upgrading vllm and transformers version
ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

If it helps here is pip freeze: https://gist.github.com/macsz/4735d3b5265040ffda1220f0b2480acc

shivam278

Jul 24

I'm also getting this error while loading this llama 3.1 8b instruct:

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

FelinaeBlanc

Jul 24

Solution : pip install --upgrade transformers

Worked for me, ty!

shivam278

Jul 24

i tried this, and also upgraded the libraries mentioned in this discussion.

abcdata

Jul 24

I had the same problem, upgraded transformers and pip, it worked! Do not forget to restart kernel after upgrading packages.

Franek18

Jul 24

•

edited Jul 24

I' ve had the same issue and
pip install --upgrade transformers
was enough and worked for me.

shrijayan

Jul 24

•

edited Jul 24

Please update both vllm and transformers

pip install --upgrade transformers
pip install --upgrade vllm

agrawalchitranshu

Jul 24

I' ve had the same issue and
pip install --upgrade transformers
was enough and worked for me.

Can you please share you requirements.txt?

agrawalchitranshu

Jul 24

Please update both vllm and transformers
pip install --upgrade transformers
pip install --upgrade vllm

still not working for me.

agrawalchitranshu

Jul 24

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

shrijayan

Jul 24

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

What was before editing this?

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

jsemrau

Jul 24

The out of memory error is a whole different set of problems. You don't have enough VRAM to run the model. Do you use quantization when running the model?

marseller

Jul 24

•

edited 22 days ago

Everyone that is using Huggingface Estimator in sagemaker 2.226.0(latest) will have to wait because currently it supports only transformers 4.36.0 image

aijunzi

Jul 24

"I have transformers==4.43.1"

This works. It should be version 4.43.+
https://github.com/huggingface/transformers/releases

pip install transformers==4.43.1

macsz

Jul 24

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

LLaMA 3.1 has larger context window of 128K vs 8k in LLaMA 3. Try reducing it and see if it works. In VLLM you can set it with --max-model-len param.

agrawalchitranshu

Jul 24

I found a 'fix' but I'm not sure what the side effects might be. In the config.json file of the model find the entry that says "rope_scaling" and replace it with this

"rope_scaling": {
"factor": 8.0,
"type": "dynamic"
},

I honestly do not know what these values mean, I just fed in values the loader said it wanted and it seems to work.

After doing this, I am getting below error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 44.32 GiB of which 6.96 GiB is free. Process 1447583 has 608.00 MiB memory in use. Process 808909 has 612.00 MiB memory in use. Process 1213235 has 528.00 MiB memory in use. Process 1658457 has 19.74 GiB memory in use. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 15.23 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But llama3 is getting loaded fine.

LLaMA 3.1 has larger context window of 128K vs 8k in LLaMA 3. Try reducing it and see if it works. In VLLM you can set it with --max-model-len param.

It worked when i did "rope_scaling" : null.
I am not sure how this can affect the inference results.
But this is working now.

eliotj

Jul 24

None of the above solutions have worked, I have transformers and vllm upgraded completely, and I've tried editing the rope_scaling parameter itself but I keep running into OOM errors when tho I'm running on an A100 80GB. Anyone have any more solutions? I'm not against outside the box thinking at this point

macsz

Jul 24

Remove ${HF_HOME} and re-run? Maybe some garbage got cached.

eliotj

Jul 24

Remove ${HF_HOME} and re-run? Maybe some garbage got cached.

no dice :/

raihan2345

Jul 25

None of the above solutions have worked, I have transformers and vllm upgraded completely, and I've tried editing the rope_scaling parameter itself but I keep running into OOM errors when tho I'm running on an A100 80GB. Anyone have any more solutions? I'm not against outside the box thinking at this point

same problem

pcuenq

Meta Llama org Jul 25

As @jsemrau mentioned, please make sure that you are on transformers 4.43.2 (or higher) by running pip install --upgrade transformers. This should fix the original issue about rope_scaling. For other issues (like OOM problems), I would suggest to open new issues and provide system details.

Chia438

Jul 25

Solution : pip install --upgrade transformers

Also works for me, cheers!

SidNath24

Jul 25

How does one do this via the docker installation of TGI? Do we need to build a separate dockerfile first with an upgrade transformers?

froilo

Jul 25

after changing to
"rope_scaling": { "factor": 8.0, "type": "dynamic" },
it procedes further
then
... Lib\site-packages\transformers\integrations\awq.py", line 354, in _fuse_awq_mlp new_module = target_cls(gate_proj, down_proj, up_proj, activation_fn) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "d:\code\autoawq\awq\modules\fused\mlp.py", line 41, in __init__ self.linear = awq_ext.gemm_forward_cuda ^^^^^^^ NameError: name 'awq_ext' is not defined

;D

HugoBra

Jul 28

Everyone that is using sagemaker 2.226.0(latest) will have to wait because currently it supports only transformers 4.36.0 image

Anyone knows where can I get a follow up on this info?

dineth9d

Jul 29

•

edited Jul 30

I encountered the same issue, and running the following command resolved it for me:

pip install --upgrade transformers==4.43.3

P.S. If you encounter the same issue repeatedly, check other libraries to see if they are installing different versions of transformers(ex: bitsandbytes). For the best results, after installing all the libraries, update transformers to version 4.43.3.

byamasuwhatnowis

Jul 30

I am getting an error after fine-tuning the Llama 3.1 8B Instruct model and deploying it to SageMaker. I configured SageMaker to use HuggingFace Transformers 4.43, and the deployment was successful. However, when I try to test the endpoint, it gives this error. How can I run pip install --upgrade transformers==4.43.2?

Received client error (400) from 3VSBZEPFose1o1Q8vAytfGhMQD1cnCE5T83b with message "{ "code": 400, "type": "InternalServerException", "message": "rope_scalingmust be a dictionary with with two fields,typeandfactor, got {\u0027factor\u0027: 8.0, \u0027high_freq_factor\u0027: 4.0, \u0027low_freq_factor\u0027: 1.0, \u0027original_max_position_embeddings\u0027: 8192, \u0027rope_type\u0027: \u0027llama3\u0027}" } "

quazimodo

Jul 30

•

edited Jul 30

Edit: Disclaimer -> For Aws
Changing the image worked for me. Struggled with all other recommended images, as well as a custom image upgrading transformers. I can't yet explain it.

This one:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0

sahanes

10 days ago

This error will be resolved if we do an update in the train.py function in axolotl.cli like the following for the rope_scaling: (I did this for my fine-tuning purpose and it worked)

Inject rope_scaling configuration if missing or incomplete

if not hasattr(cfg, 'rope_scaling') or 'type' not in cfg.rope_scaling or 'factor' not in cfg.rope_scaling:
    LOG.warning("`rope_scaling` not found or incomplete in config, applying defaults.")
    cfg.rope_scaling = {
        "type": "linear",  # You can set it to "dynamic" if that's preferred
        "factor": 8.0
    }

meta-llama
/

Meta-Llama-3.1-8B-Instruct

ValueError: `rope_scaling` must be a dictionary with two fields

Inject rope_scaling configuration if missing or incomplete