CUDA error: device-side assert triggered

#2
by raptorkwok - opened

Hi Ayaka,

I encounter this error when training with this model:

../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [573,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Codes are as follows:

checkpoint = 'Ayaka/bart-base-cantonese'
tokenizer = BertTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint, output_attentions = True, output_hidden_states = True)
batch_size = 4
training_args = CustomSeq2SeqTrainingArguments(
    output_dir = output_model,
    evaluation_strategy = "epoch",
    optim = "adamw_torch", 
    eval_steps = 5000, # Previously: 1000
    #logging_steps = 1000,
    #save_steps = 5000,
    save_strategy = "epoch",
    learning_rate = 2e-5,
    per_device_train_batch_size = batch_size,
    per_device_eval_batch_size = batch_size,
    weight_decay = 0.01,
    save_total_limit = 1,
    num_train_epochs = 30, 
    predict_with_generate=True,
    remove_unused_columns=True,
    fp16 = True,
    push_to_hub = True,
    metric_for_best_model = "bleu", 
    load_best_model_at_end = True,
    report_to = "wandb"
)

trainer = Seq2SeqTrainer(
    model = model,
    #model_init = model_init,
    args = training_args,
    train_dataset = tokenized_yuezh_master['train'],
    eval_dataset = tokenized_yuezh_master['val'],
    tokenizer = tokenizer,
    data_collator = data_collator,
    compute_metrics = compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience=3)]
)

Using:

  • NVIDIA-SMI 535.154.05
  • Driver Version: 535.154.05
  • CUDA Version: 12.2
  • NVIDIA GeForce RTX 3080 Ti 12GB RAM
  • Ubuntu 22.04.2 LTS

Sign up or log in to comment