Is there a way to train Jina-embeddings-v3 using Transformers Trainer API? (Encountering NameError in rotary.py)
Hi everyone,
I'm attempting to fine-tune the Jina-embeddings-v3
for a sequence classification task. My goal is to use the standard Hugging Face transformers.Trainer
API for this process.
To achieve this, I've been following these steps:
- Loading the base model using
AutoModel.from_pretrained("Jina-embeddings-v3", trust_remote_code=True, config=modified_config)
, wheremodified_config
might havelora_main_params_trainable
adjusted. - Creating a custom PyTorch
nn.Module
wrapper around this base Jina model. This wrapper:- Adds a classification head (
nn.Linear
). - Implements mean pooling over the base model's
last_hidden_state
(as the model doesn't use a CLS token for sequence representation). - Handles passing an
adapter_mask
to the base model's forward pass if a specific LoRA adaptation (like "classification") is targeted. - Computes
CrossEntropyLoss
if labels are provided.
- Adds a classification head (
- Setting
requires_grad
appropriately to fine-tune only the LoRA adapter parameters (identified by "parametrizations" in their names) and the new classification head, keeping other base model weights frozen (iflora_main_params_trainable=False
).
The forward pass of my custom wrapped model works correctly when tested with tokenized inputs. However, when I try to train it using trainer.train()
, I consistently encounter a NameError
during the backward pass.
Here's the relevant part of the traceback:
Traceback (most recent call last):
File "/path/to/your/script.py", line X, in <your_training_initiation_function_or_cell> # Adjust path
trainer.train()
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2245, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2560, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3782, in training_step
self.accelerator.backward(loss, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/accelerate/accelerator.py", line 2359, in backward
loss.backward(**kwargs)
File "/usr/local/lib/python3.11/dist-packages/torch/_tensor.py", line 626, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.11/dist-packages/torch/autograd/__init__.py", line 347, in backward
_engine_run_backward(
File "/usr/local/lib/python3.11/dist-packages/torch/autograd/graph.py", line 823, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/jinaai/xlm-roberta-flash-implementation/2b6bc3f30750b3a9648fe9b63448c09920efe9be/rotary.py", line 257, in backward
apply_rotary( # <--- THIS IS THE LINE CAUSING THE NameError
^^^^^^^^^^^^
NameError: name 'apply_rotary' is not defined
The error NameError: name 'apply_rotary' is not defined
seems to originate from the model's custom rotary.py
file (specifically the one cached from jinaai/xlm-roberta-flash-implementation
). This suggests an issue within the model's own implementation of the backward pass for its rotary position embeddings, where the apply_rotary
function is not accessible within the scope of the custom torch.autograd.Function
's backward
method.
My questions are:
- Has anyone else encountered this
NameError
when trying to fine-tune this Jina model (or related Jina embedding models that might use the samerotary.py
module)? - Is there a known fix or workaround for this issue that would allow training with the
transformers.Trainer
? - Given this internal error, what is the recommended approach by Jina AI for fine-tuning these models for a downstream task like sequence classification if the standard
Trainer
approach hits this roadblock? (The model card mentionsSentenceTransformerTrainer
, but my aim was to use the standardtransformers.Trainer
with a classification head.)
Any insights or suggestions would be greatly appreciated!
Thanks!