Fine tune model (SFTTrainer and transformers.Trainer)
#1
by
NickyNicky
- opened
dataset_new
Dataset({
features: ['text'],
num_rows: 15011
})
lm_dataset
Dataset({
features: ['input_ids', 'attention_mask', 'labels'],
num_rows: 1031
})
from trl import SFTTrainer # https://huggingface.co/docs/trl/sft_trainer
import transformers
max_seq_length = max_length #get_max_length() # max sequence length for model and packing of the dataset
trainer = SFTTrainer(
model=model,
train_dataset=dataset_new,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
# packing=True,
# formatting_func=format_instruction,
args=args,
)
'''ERROR 1
ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,labels,attention_mask.
'''
trainer.train() # there will not be a progress bar since tqdm is disabled
# trainer = transformers.Trainer(
# model=model,
# train_dataset=lm_dataset,
# # eval_dataset=val_dataset,
# tokenizer=tokenizer,
# args=args,
# data_collator=transformers.DataCollatorForLanguageModeling(tokenizer,False),
# )
# # ''' ERROR 2
# # ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask,labels.
# # '''
# trainer.train()
Thanks for catching this! Looking into it now.
deleted
changed discussion status to
closed
!pip install transformers==4.33.2 -qqq
!pip install bitsandbytes==0.38.0 -qqq
!pip install "datasets==2.13.0" peft accelerate trl "safetensors>=0.3.1" --upgrade -qqq
!pip install ninja packaging --upgrade -qqq
!pip install sentencepiece -qqq
!pip install -U xformers deepspeed -qqq
!python -c "import torch; assert torch.cuda.get_device_capability()[0] >= 8, 'Hardware not supported for Flash Attention'"
!export CUDA_HOME=/usr/local/cuda-11.8
# !MAX_JOBS=4 pip install flash-attn --no-build-isolation
!MAX_JOBS=4 pip install flash-attn --no-build-isolation -qqq
!pip install git+"https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary" -qqq
from trl import SFTTrainer # https://huggingface.co/docs/trl/sft_trainer
import transformers
liberaMemoria()
max_seq_length = max_length #get_max_length() # max sequence length for model and packing of the dataset
trainer = SFTTrainer(
model=model,
train_dataset=dataset_new, #lm_dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
# packing=True,
# formatting_func=format_instruction,
args=args,
)
trainer.train() # there will not be a progress bar since tqdm is disabled
You are using 8-bit optimizers with a version of `bitsandbytes` < 0.41.1. It is recommended to update your version as a major bug has been fixed in 8-bit optimizers.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-259460b0944c> in <cell line: 21>()
19 ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,labels,attention_mask.
20 '''
---> 21 trainer.train() # there will not be a progress bar since tqdm is disabled
22
23
3 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in create_optimizer(self)
983 )
984 else:
--> 985 self.optimizer = optimizer_cls(optimizer_grouped_parameters, **optimizer_kwargs)
986 if optimizer_cls.__name__ == "Adam8bit":
987 import bitsandbytes
TypeError: AdamW.__init__() got an unexpected keyword argument 'is_paged'
NickyNicky
changed discussion status to
open
resolved:
!pip install transformers==4.33.2 -qqq
!pip install bitsandbytes==0.40.0 -qqq
NickyNicky
changed discussion status to
closed
NickyNicky
changed discussion status to
open
This comment has been hidden
Hey @NickyNicky very cool + interesting + i have carved out time to figure this out, i'm really glad you're working this out, i would love to help, so it seems you're working off of some code already maybe from a public source, i would love share my results and try to keep getting information + data required for the fine tuning experiments, if you share with me what you're working with, i'll work off of that instead of my usual notebooks i ripped from medium or whatever. ^^