2025-02-15 02:18:13,499 - training_args.py:2100 - _setup_devices - INFO - PyTorch: setting up devices
2025-02-15 02:18:14,065 - configuration_utils.py:731 - _get_config_dict - INFO - loading configuration file ./checkpoints/longvu_llama3_2/config.json
2025-02-15 02:18:14,068 - configuration_utils.py:800 - from_dict - INFO - Model config CambrianConfig {
  "_name_or_path": "/tmp/iopath_cache/manifold_cache/tree/users/shenx/finetune/09281004-cambrian_llama3_2_t576_ov",
  "architectures": [
    "CambrianLlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "connect_layer": 2,
  "connector_depth": 3,
  "connector_only": true,
  "dino_threshold": 0.83,
  "drop_threshold": 0.8,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "frame_pos": false,
  "freeze_mm_mlp_adapter": false,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "highres": true,
  "highres_connect": false,
  "image_aspect_ratio": "pad",
  "image_position": 91,
  "image_token_len": 144,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "is_image_newline": true,
  "is_st_sampler": false,
  "lowres_token": 8,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "mm_patch_merge_type": "flat",
  "mm_projector_lr": null,
  "mm_projector_type": "sva",
  "mm_use_im_patch_token": false,
  "mm_use_im_start_end": false,
  "mm_vision_sampler_lr": null,
  "mm_vision_select_feature": "patch",
  "mm_vision_select_layer": -2,
  "mm_vision_tower_aux_list": [
    "siglip/CLIP-ViT-SO400M-14-384",
    "facebook/dinov2-giant-res378"
  ],
  "mm_vision_tower_aux_token_len_list": [
    576,
    576
  ],
  "mm_vision_tower_lr": null,
  "model_type": "cambrian_llama",
  "num_attention_heads": 24,
  "num_hidden_layers": 28,
  "num_key_value_heads": 8,
  "num_of_vision_sampler_layers": 10,
  "num_query_group": 1,
  "pretraining_tp": 1,
  "query_num_list": [
    144
  ],
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "factor": 32.0,
    "high_freq_factor": 4.0,
    "low_freq_factor": 1.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },
  "rope_theta": 500000.0,
  "spmd_debug": null,
  "spmd_fsdp_sharding": null,
  "spmd_mesh": null,
  "start_of_vision_sampler_layers": 0,
  "stride_of_vision_sampler_layers": 3,
  "tie_word_embeddings": false,
  "tokenizer_model_max_length": 8192,
  "tokenizer_padding_side": "right",
  "torch_dtype": "float32",
  "transformers_version": "4.43.1",
  "tune_mm_mlp_adapter": false,
  "unfreeze_mm_vision_tower": false,
  "use_cache": false,
  "use_mm_proj": true,
  "vision_hidden_size": 1024,
  "vision_tower_aux_token_len_list": [
    576,
    576
  ],
  "vocab_size": 128256
}

2025-02-15 02:18:14,069 - modeling_utils.py:3618 - from_pretrained - INFO - loading weights file ./checkpoints/longvu_llama3_2/pytorch_model.bin
2025-02-15 02:18:14,106 - configuration_utils.py:1038 - from_dict - INFO - Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "use_cache": false
}

2025-02-15 02:18:14,324 - configuration_utils.py:733 - _get_config_dict - INFO - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/config.json
2025-02-15 02:18:14,327 - configuration_utils.py:800 - from_dict - INFO - Model config Dinov2Config {
  "apply_layernorm": true,
  "architectures": [
    "Dinov2Model"
  ],
  "attention_probs_dropout_prob": 0.0,
  "drop_path_rate": 0.0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 1536,
  "image_size": 518,
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-06,
  "layerscale_value": 1.0,
  "mlp_ratio": 4,
  "model_type": "dinov2",
  "num_attention_heads": 24,
  "num_channels": 3,
  "num_hidden_layers": 40,
  "out_features": [
    "stage40"
  ],
  "out_indices": [
    40
  ],
  "patch_size": 14,
  "qkv_bias": true,
  "reshape_hidden_states": true,
  "stage_names": [
    "stem",
    "stage1",
    "stage2",
    "stage3",
    "stage4",
    "stage5",
    "stage6",
    "stage7",
    "stage8",
    "stage9",
    "stage10",
    "stage11",
    "stage12",
    "stage13",
    "stage14",
    "stage15",
    "stage16",
    "stage17",
    "stage18",
    "stage19",
    "stage20",
    "stage21",
    "stage22",
    "stage23",
    "stage24",
    "stage25",
    "stage26",
    "stage27",
    "stage28",
    "stage29",
    "stage30",
    "stage31",
    "stage32",
    "stage33",
    "stage34",
    "stage35",
    "stage36",
    "stage37",
    "stage38",
    "stage39",
    "stage40"
  ],
  "torch_dtype": "float32",
  "transformers_version": "4.43.1",
  "use_swiglu_ffn": true
}

2025-02-15 02:18:15,865 - modeling_utils.py:4450 - _load_pretrained_model - INFO - All model checkpoint weights were used when initializing CambrianLlamaForCausalLM.

2025-02-15 02:18:15,866 - modeling_utils.py:4458 - _load_pretrained_model - INFO - All the weights of CambrianLlamaForCausalLM were initialized from the model checkpoint at ./checkpoints/longvu_llama3_2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use CambrianLlamaForCausalLM for predictions without further training.
2025-02-15 02:18:15,871 - configuration_utils.py:991 - from_pretrained - INFO - loading configuration file ./checkpoints/longvu_llama3_2/generation_config.json
2025-02-15 02:18:15,872 - configuration_utils.py:1038 - from_dict - INFO - Generate config GenerationConfig {
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "temperature": 0.6,
  "top_p": 0.9
}

2025-02-15 02:18:16,109 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file tokenizer.json
2025-02-15 02:18:16,110 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file added_tokens.json
2025-02-15 02:18:16,110 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file special_tokens_map.json
2025-02-15 02:18:16,110 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file tokenizer_config.json
2025-02-15 02:18:16,519 - tokenization_utils_base.py:2533 - _from_pretrained - INFO - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-02-15 02:18:16,889 - configuration_utils.py:733 - _get_config_dict - INFO - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3/config.json
2025-02-15 02:18:16,891 - configuration_utils.py:800 - from_dict - INFO - Model config SiglipVisionConfig {
  "attention_dropout": 0.0,
  "hidden_act": "gelu_pytorch_tanh",
  "hidden_size": 1152,
  "image_size": 384,
  "intermediate_size": 4304,
  "layer_norm_eps": 1e-06,
  "model_type": "siglip_vision_model",
  "num_attention_heads": 16,
  "num_channels": 3,
  "num_hidden_layers": 27,
  "patch_size": 14,
  "transformers_version": "4.43.1"
}

2025-02-15 02:18:16,891 - modeling_utils.py:3621 - from_pretrained - INFO - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3/model.safetensors
2025-02-15 02:18:17,159 - modeling_utils.py:4440 - _load_pretrained_model - INFO - Some weights of the model checkpoint at google/siglip-so400m-patch14-384 were not used when initializing SiglipVisionModel: ['logit_bias', 'logit_scale', 'text_model.embeddings.position_embedding.weight', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.12.layer_norm1.bias', 'text_model.encoder.layers.12.layer_norm1.weight', 'text_model.encoder.layers.12.layer_norm2.bias', 'text_model.encoder.layers.12.layer_norm2.weight', 'text_model.encoder.layers.12.mlp.fc1.bias', 'text_model.encoder.layers.12.mlp.fc1.weight', 'text_model.encoder.layers.12.mlp.fc2.bias', 'text_model.encoder.layers.12.mlp.fc2.weight', 'text_model.encoder.layers.12.self_attn.k_proj.bias', 'text_model.encoder.layers.12.self_attn.k_proj.weight', 'text_model.encoder.layers.12.self_attn.out_proj.bias', 'text_model.encoder.layers.12.self_attn.out_proj.weight', 'text_model.encoder.layers.12.self_attn.q_proj.bias', 'text_model.encoder.layers.12.self_attn.q_proj.weight', 'text_model.encoder.layers.12.self_attn.v_proj.bias', 'text_model.encoder.layers.12.self_attn.v_proj.weight', 'text_model.encoder.layers.13.layer_norm1.bias', 'text_model.encoder.layers.13.layer_norm1.weight', 'text_model.encoder.layers.13.layer_norm2.bias', 'text_model.encoder.layers.13.layer_norm2.weight', 'text_model.encoder.layers.13.mlp.fc1.bias', 'text_model.encoder.layers.13.mlp.fc1.weight', 'text_model.encoder.layers.13.mlp.fc2.bias', 'text_model.encoder.layers.13.mlp.fc2.weight', 'text_model.encoder.layers.13.self_attn.k_proj.bias', 'text_model.encoder.layers.13.self_attn.k_proj.weight', 'text_model.encoder.layers.13.self_attn.out_proj.bias', 'text_model.encoder.layers.13.self_attn.out_proj.weight', 'text_model.encoder.layers.13.self_attn.q_proj.bias', 'text_model.encoder.layers.13.self_attn.q_proj.weight', 'text_model.encoder.layers.13.self_attn.v_proj.bias', 'text_model.encoder.layers.13.self_attn.v_proj.weight', 'text_model.encoder.layers.14.layer_norm1.bias', 'text_model.encoder.layers.14.layer_norm1.weight', 'text_model.encoder.layers.14.layer_norm2.bias', 'text_model.encoder.layers.14.layer_norm2.weight', 'text_model.encoder.layers.14.mlp.fc1.bias', 'text_model.encoder.layers.14.mlp.fc1.weight', 'text_model.encoder.layers.14.mlp.fc2.bias', 'text_model.encoder.layers.14.mlp.fc2.weight', 'text_model.encoder.layers.14.self_attn.k_proj.bias', 'text_model.encoder.layers.14.self_attn.k_proj.weight', 'text_model.encoder.layers.14.self_attn.out_proj.bias', 'text_model.encoder.layers.14.self_attn.out_proj.weight', 'text_model.encoder.layers.14.self_attn.q_proj.bias', 'text_model.encoder.layers.14.self_attn.q_proj.weight', 'text_model.encoder.layers.14.self_attn.v_proj.bias', 'text_model.encoder.layers.14.self_attn.v_proj.weight', 'text_model.encoder.layers.15.layer_norm1.bias', 'text_model.encoder.layers.15.layer_norm1.weight', 'text_model.encoder.layers.15.layer_norm2.bias', 'text_model.encoder.layers.15.layer_norm2.weight', 'text_model.encoder.layers.15.mlp.fc1.bias', 'text_model.encoder.layers.15.mlp.fc1.weight', 'text_model.encoder.layers.15.mlp.fc2.bias', 'text_model.encoder.layers.15.mlp.fc2.weight', 'text_model.encoder.layers.15.self_attn.k_proj.bias', 'text_model.encoder.layers.15.self_attn.k_proj.weight', 'text_model.encoder.layers.15.self_attn.out_proj.bias', 'text_model.encoder.layers.15.self_attn.out_proj.weight', 'text_model.encoder.layers.15.self_attn.q_proj.bias', 'text_model.encoder.layers.15.self_attn.q_proj.weight', 'text_model.encoder.layers.15.self_attn.v_proj.bias', 'text_model.encoder.layers.15.self_attn.v_proj.weight', 'text_model.encoder.layers.16.layer_norm1.bias', 'text_model.encoder.layers.16.layer_norm1.weight', 'text_model.encoder.layers.16.layer_norm2.bias', 'text_model.encoder.layers.16.layer_norm2.weight', 'text_model.encoder.layers.16.mlp.fc1.bias', 'text_model.encoder.layers.16.mlp.fc1.weight', 'text_model.encoder.layers.16.mlp.fc2.bias', 'text_model.encoder.layers.16.mlp.fc2.weight', 'text_model.encoder.layers.16.self_attn.k_proj.bias', 'text_model.encoder.layers.16.self_attn.k_proj.weight', 'text_model.encoder.layers.16.self_attn.out_proj.bias', 'text_model.encoder.layers.16.self_attn.out_proj.weight', 'text_model.encoder.layers.16.self_attn.q_proj.bias', 'text_model.encoder.layers.16.self_attn.q_proj.weight', 'text_model.encoder.layers.16.self_attn.v_proj.bias', 'text_model.encoder.layers.16.self_attn.v_proj.weight', 'text_model.encoder.layers.17.layer_norm1.bias', 'text_model.encoder.layers.17.layer_norm1.weight', 'text_model.encoder.layers.17.layer_norm2.bias', 'text_model.encoder.layers.17.layer_norm2.weight', 'text_model.encoder.layers.17.mlp.fc1.bias', 'text_model.encoder.layers.17.mlp.fc1.weight', 'text_model.encoder.layers.17.mlp.fc2.bias', 'text_model.encoder.layers.17.mlp.fc2.weight', 'text_model.encoder.layers.17.self_attn.k_proj.bias', 'text_model.encoder.layers.17.self_attn.k_proj.weight', 'text_model.encoder.layers.17.self_attn.out_proj.bias', 'text_model.encoder.layers.17.self_attn.out_proj.weight', 'text_model.encoder.layers.17.self_attn.q_proj.bias', 'text_model.encoder.layers.17.self_attn.q_proj.weight', 'text_model.encoder.layers.17.self_attn.v_proj.bias', 'text_model.encoder.layers.17.self_attn.v_proj.weight', 'text_model.encoder.layers.18.layer_norm1.bias', 'text_model.encoder.layers.18.layer_norm1.weight', 'text_model.encoder.layers.18.layer_norm2.bias', 'text_model.encoder.layers.18.layer_norm2.weight', 'text_model.encoder.layers.18.mlp.fc1.bias', 'text_model.encoder.layers.18.mlp.fc1.weight', 'text_model.encoder.layers.18.mlp.fc2.bias', 'text_model.encoder.layers.18.mlp.fc2.weight', 'text_model.encoder.layers.18.self_attn.k_proj.bias', 'text_model.encoder.layers.18.self_attn.k_proj.weight', 'text_model.encoder.layers.18.self_attn.out_proj.bias', 'text_model.encoder.layers.18.self_attn.out_proj.weight', 'text_model.encoder.layers.18.self_attn.q_proj.bias', 'text_model.encoder.layers.18.self_attn.q_proj.weight', 'text_model.encoder.layers.18.self_attn.v_proj.bias', 'text_model.encoder.layers.18.self_attn.v_proj.weight', 'text_model.encoder.layers.19.layer_norm1.bias', 'text_model.encoder.layers.19.layer_norm1.weight', 'text_model.encoder.layers.19.layer_norm2.bias', 'text_model.encoder.layers.19.layer_norm2.weight', 'text_model.encoder.layers.19.mlp.fc1.bias', 'text_model.encoder.layers.19.mlp.fc1.weight', 'text_model.encoder.layers.19.mlp.fc2.bias', 'text_model.encoder.layers.19.mlp.fc2.weight', 'text_model.encoder.layers.19.self_attn.k_proj.bias', 'text_model.encoder.layers.19.self_attn.k_proj.weight', 'text_model.encoder.layers.19.self_attn.out_proj.bias', 'text_model.encoder.layers.19.self_attn.out_proj.weight', 'text_model.encoder.layers.19.self_attn.q_proj.bias', 'text_model.encoder.layers.19.self_attn.q_proj.weight', 'text_model.encoder.layers.19.self_attn.v_proj.bias', 'text_model.encoder.layers.19.self_attn.v_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.20.layer_norm1.bias', 'text_model.encoder.layers.20.layer_norm1.weight', 'text_model.encoder.layers.20.layer_norm2.bias', 'text_model.encoder.layers.20.layer_norm2.weight', 'text_model.encoder.layers.20.mlp.fc1.bias', 'text_model.encoder.layers.20.mlp.fc1.weight', 'text_model.encoder.layers.20.mlp.fc2.bias', 'text_model.encoder.layers.20.mlp.fc2.weight', 'text_model.encoder.layers.20.self_attn.k_proj.bias', 'text_model.encoder.layers.20.self_attn.k_proj.weight', 'text_model.encoder.layers.20.self_attn.out_proj.bias', 'text_model.encoder.layers.20.self_attn.out_proj.weight', 'text_model.encoder.layers.20.self_attn.q_proj.bias', 'text_model.encoder.layers.20.self_attn.q_proj.weight', 'text_model.encoder.layers.20.self_attn.v_proj.bias', 'text_model.encoder.layers.20.self_attn.v_proj.weight', 'text_model.encoder.layers.21.layer_norm1.bias', 'text_model.encoder.layers.21.layer_norm1.weight', 'text_model.encoder.layers.21.layer_norm2.bias', 'text_model.encoder.layers.21.layer_norm2.weight', 'text_model.encoder.layers.21.mlp.fc1.bias', 'text_model.encoder.layers.21.mlp.fc1.weight', 'text_model.encoder.layers.21.mlp.fc2.bias', 'text_model.encoder.layers.21.mlp.fc2.weight', 'text_model.encoder.layers.21.self_attn.k_proj.bias', 'text_model.encoder.layers.21.self_attn.k_proj.weight', 'text_model.encoder.layers.21.self_attn.out_proj.bias', 'text_model.encoder.layers.21.self_attn.out_proj.weight', 'text_model.encoder.layers.21.self_attn.q_proj.bias', 'text_model.encoder.layers.21.self_attn.q_proj.weight', 'text_model.encoder.layers.21.self_attn.v_proj.bias', 'text_model.encoder.layers.21.self_attn.v_proj.weight', 'text_model.encoder.layers.22.layer_norm1.bias', 'text_model.encoder.layers.22.layer_norm1.weight', 'text_model.encoder.layers.22.layer_norm2.bias', 'text_model.encoder.layers.22.layer_norm2.weight', 'text_model.encoder.layers.22.mlp.fc1.bias', 'text_model.encoder.layers.22.mlp.fc1.weight', 'text_model.encoder.layers.22.mlp.fc2.bias', 'text_model.encoder.layers.22.mlp.fc2.weight', 'text_model.encoder.layers.22.self_attn.k_proj.bias', 'text_model.encoder.layers.22.self_attn.k_proj.weight', 'text_model.encoder.layers.22.self_attn.out_proj.bias', 'text_model.encoder.layers.22.self_attn.out_proj.weight', 'text_model.encoder.layers.22.self_attn.q_proj.bias', 'text_model.encoder.layers.22.self_attn.q_proj.weight', 'text_model.encoder.layers.22.self_attn.v_proj.bias', 'text_model.encoder.layers.22.self_attn.v_proj.weight', 'text_model.encoder.layers.23.layer_norm1.bias', 'text_model.encoder.layers.23.layer_norm1.weight', 'text_model.encoder.layers.23.layer_norm2.bias', 'text_model.encoder.layers.23.layer_norm2.weight', 'text_model.encoder.layers.23.mlp.fc1.bias', 'text_model.encoder.layers.23.mlp.fc1.weight', 'text_model.encoder.layers.23.mlp.fc2.bias', 'text_model.encoder.layers.23.mlp.fc2.weight', 'text_model.encoder.layers.23.self_attn.k_proj.bias', 'text_model.encoder.layers.23.self_attn.k_proj.weight', 'text_model.encoder.layers.23.self_attn.out_proj.bias', 'text_model.encoder.layers.23.self_attn.out_proj.weight', 'text_model.encoder.layers.23.self_attn.q_proj.bias', 'text_model.encoder.layers.23.self_attn.q_proj.weight', 'text_model.encoder.layers.23.self_attn.v_proj.bias', 'text_model.encoder.layers.23.self_attn.v_proj.weight', 'text_model.encoder.layers.24.layer_norm1.bias', 'text_model.encoder.layers.24.layer_norm1.weight', 'text_model.encoder.layers.24.layer_norm2.bias', 'text_model.encoder.layers.24.layer_norm2.weight', 'text_model.encoder.layers.24.mlp.fc1.bias', 'text_model.encoder.layers.24.mlp.fc1.weight', 'text_model.encoder.layers.24.mlp.fc2.bias', 'text_model.encoder.layers.24.mlp.fc2.weight', 'text_model.encoder.layers.24.self_attn.k_proj.bias', 'text_model.encoder.layers.24.self_attn.k_proj.weight', 'text_model.encoder.layers.24.self_attn.out_proj.bias', 'text_model.encoder.layers.24.self_attn.out_proj.weight', 'text_model.encoder.layers.24.self_attn.q_proj.bias', 'text_model.encoder.layers.24.self_attn.q_proj.weight', 'text_model.encoder.layers.24.self_attn.v_proj.bias', 'text_model.encoder.layers.24.self_attn.v_proj.weight', 'text_model.encoder.layers.25.layer_norm1.bias', 'text_model.encoder.layers.25.layer_norm1.weight', 'text_model.encoder.layers.25.layer_norm2.bias', 'text_model.encoder.layers.25.layer_norm2.weight', 'text_model.encoder.layers.25.mlp.fc1.bias', 'text_model.encoder.layers.25.mlp.fc1.weight', 'text_model.encoder.layers.25.mlp.fc2.bias', 'text_model.encoder.layers.25.mlp.fc2.weight', 'text_model.encoder.layers.25.self_attn.k_proj.bias', 'text_model.encoder.layers.25.self_attn.k_proj.weight', 'text_model.encoder.layers.25.self_attn.out_proj.bias', 'text_model.encoder.layers.25.self_attn.out_proj.weight', 'text_model.encoder.layers.25.self_attn.q_proj.bias', 'text_model.encoder.layers.25.self_attn.q_proj.weight', 'text_model.encoder.layers.25.self_attn.v_proj.bias', 'text_model.encoder.layers.25.self_attn.v_proj.weight', 'text_model.encoder.layers.26.layer_norm1.bias', 'text_model.encoder.layers.26.layer_norm1.weight', 'text_model.encoder.layers.26.layer_norm2.bias', 'text_model.encoder.layers.26.layer_norm2.weight', 'text_model.encoder.layers.26.mlp.fc1.bias', 'text_model.encoder.layers.26.mlp.fc1.weight', 'text_model.encoder.layers.26.mlp.fc2.bias', 'text_model.encoder.layers.26.mlp.fc2.weight', 'text_model.encoder.layers.26.self_attn.k_proj.bias', 'text_model.encoder.layers.26.self_attn.k_proj.weight', 'text_model.encoder.layers.26.self_attn.out_proj.bias', 'text_model.encoder.layers.26.self_attn.out_proj.weight', 'text_model.encoder.layers.26.self_attn.q_proj.bias', 'text_model.encoder.layers.26.self_attn.q_proj.weight', 'text_model.encoder.layers.26.self_attn.v_proj.bias', 'text_model.encoder.layers.26.self_attn.v_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.final_layer_norm.bias', 'text_model.final_layer_norm.weight', 'text_model.head.bias', 'text_model.head.weight']
- This IS expected if you are initializing SiglipVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing SiglipVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2025-02-15 02:18:17,160 - modeling_utils.py:4458 - _load_pretrained_model - INFO - All the weights of SiglipVisionModel were initialized from the model checkpoint at google/siglip-so400m-patch14-384.
If your task is similar to the task the model of the checkpoint was trained on, you can already use SiglipVisionModel for predictions without further training.
2025-02-15 02:18:17,365 - image_processing_base.py:375 - get_image_processor_dict - INFO - loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3/preprocessor_config.json
2025-02-15 02:18:17,366 - image_processing_base.py:429 - from_dict - INFO - Image processor SiglipImageProcessor {
  "do_convert_rgb": null,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.5,
    0.5,
    0.5
  ],
  "image_processor_type": "SiglipImageProcessor",
  "image_std": [
    0.5,
    0.5,
    0.5
  ],
  "processor_class": "SiglipProcessor",
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "height": 384,
    "width": 384
  }
}

2025-02-15 02:18:17,740 - configuration_utils.py:733 - _get_config_dict - INFO - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/config.json
2025-02-15 02:18:17,744 - configuration_utils.py:800 - from_dict - INFO - Model config Dinov2Config {
  "apply_layernorm": true,
  "architectures": [
    "Dinov2Model"
  ],
  "attention_probs_dropout_prob": 0.0,
  "drop_path_rate": 0.0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 1536,
  "image_size": 518,
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-06,
  "layerscale_value": 1.0,
  "mlp_ratio": 4,
  "model_type": "dinov2",
  "num_attention_heads": 24,
  "num_channels": 3,
  "num_hidden_layers": 40,
  "out_features": [
    "stage40"
  ],
  "out_indices": [
    40
  ],
  "patch_size": 14,
  "qkv_bias": true,
  "reshape_hidden_states": true,
  "stage_names": [
    "stem",
    "stage1",
    "stage2",
    "stage3",
    "stage4",
    "stage5",
    "stage6",
    "stage7",
    "stage8",
    "stage9",
    "stage10",
    "stage11",
    "stage12",
    "stage13",
    "stage14",
    "stage15",
    "stage16",
    "stage17",
    "stage18",
    "stage19",
    "stage20",
    "stage21",
    "stage22",
    "stage23",
    "stage24",
    "stage25",
    "stage26",
    "stage27",
    "stage28",
    "stage29",
    "stage30",
    "stage31",
    "stage32",
    "stage33",
    "stage34",
    "stage35",
    "stage36",
    "stage37",
    "stage38",
    "stage39",
    "stage40"
  ],
  "torch_dtype": "float32",
  "transformers_version": "4.43.1",
  "use_swiglu_ffn": true
}

2025-02-15 02:18:17,744 - modeling_utils.py:3621 - from_pretrained - INFO - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/model.safetensors
2025-02-15 02:18:18,370 - modeling_utils.py:4450 - _load_pretrained_model - INFO - All model checkpoint weights were used when initializing Dinov2Model.

2025-02-15 02:18:18,371 - modeling_utils.py:4458 - _load_pretrained_model - INFO - All the weights of Dinov2Model were initialized from the model checkpoint at facebook/dinov2-giant.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Dinov2Model for predictions without further training.
2025-02-15 02:18:18,563 - image_processing_base.py:375 - get_image_processor_dict - INFO - loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/preprocessor_config.json
2025-02-15 02:18:18,566 - image_processing_base.py:429 - from_dict - INFO - Image processor BitImageProcessor {
  "crop_size": {
    "height": 378,
    "width": 378
  },
  "do_center_crop": true,
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.485,
    0.456,
    0.406
  ],
  "image_processor_type": "BitImageProcessor",
  "image_std": [
    0.229,
    0.224,
    0.225
  ],
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "shortest_edge": 378
  }
}

2025-02-15 02:18:19,473 - finetune_llama.py:1239 - train - INFO - Total params: 3264865280
2025-02-15 02:18:19,473 - finetune_llama.py:1240 - train - INFO - Trainable params: 12589056
2025-02-15 02:18:19,473 - finetune_llama.py:1241 - train - INFO - LM head params: 394002432
2025-02-15 02:18:23,232 - trainer_callback.py:423 - add_callback - WARNING - You are adding a <class 'transformers.integrations.integration_utils.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
2025-02-15 02:18:23,232 - trainer.py:648 - __init__ - INFO - Using auto half precision backend
2025-02-15 02:18:23,543 - trainer.py:2134 - _inner_training_loop - INFO - ***** Running training *****
2025-02-15 02:18:23,543 - trainer.py:2135 - _inner_training_loop - INFO -   Num examples = 10
2025-02-15 02:18:23,543 - trainer.py:2136 - _inner_training_loop - INFO -   Num Epochs = 2
2025-02-15 02:18:23,543 - trainer.py:2137 - _inner_training_loop - INFO -   Instantaneous batch size per device = 1
2025-02-15 02:18:23,543 - trainer.py:2140 - _inner_training_loop - INFO -   Total train batch size (w. parallel, distributed & accumulation) = 1
2025-02-15 02:18:23,543 - trainer.py:2141 - _inner_training_loop - INFO -   Gradient Accumulation steps = 1
2025-02-15 02:18:23,543 - trainer.py:2142 - _inner_training_loop - INFO -   Total optimization steps = 20
2025-02-15 02:18:23,545 - trainer.py:2143 - _inner_training_loop - INFO -   Number of trainable parameters = 406,591,488
2025-02-15 02:23:33,966 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:23:33,966 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0]
2025-02-15 02:23:33,991 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224
2025-02-15 02:23:33,996 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:23:33,996 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 3216, 3, 384, 384]), torch.float32, cuda:0]
2025-02-15 02:23:33,997 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:23:33,997 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 3216, 3, 378, 378]), torch.float32, cuda:0]
2025-02-15 02:24:23,452 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino
2025-02-15 02:24:23,453 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 870
2025-02-15 02:24:23,453 - resource_logging.py:150 - __exit__ - DEBUG - Time: 49.44 seconds
2025-02-15 02:24:23,453 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:23,453 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33686.93 MB
2025-02-15 02:24:23,453 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  45101.66 MB
2025-02-15 02:24:23,453 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  11414.73 MB
2025-02-15 02:24:23,453 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  34890.32 MB
2025-02-15 02:24:23,453 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45967.47 MB
2025-02-15 02:24:23,453 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  11077.16 MB
2025-02-15 02:24:23,453 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         56482.91 MB
2025-02-15 02:24:23,825 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame
2025-02-15 02:24:23,826 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 876
2025-02-15 02:24:23,826 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.37 seconds
2025-02-15 02:24:23,826 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:23,826 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 45101.66 MB
2025-02-15 02:24:23,826 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  30839.01 MB
2025-02-15 02:24:23,826 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -14262.66 MB
2025-02-15 02:24:23,826 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45967.47 MB
2025-02-15 02:24:23,826 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   72175.58 MB
2025-02-15 02:24:23,826 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  26208.11 MB
2025-02-15 02:24:23,826 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         78153.51 MB
2025-02-15 02:24:25,811 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip
2025-02-15 02:24:25,811 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 891
2025-02-15 02:24:25,811 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.98 seconds
2025-02-15 02:24:25,811 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:25,811 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 30839.01 MB
2025-02-15 02:24:25,811 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  31369.85 MB
2025-02-15 02:24:25,811 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  530.84 MB
2025-02-15 02:24:25,811 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  72175.58 MB
2025-02-15 02:24:25,811 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   36385.59 MB
2025-02-15 02:24:25,811 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -35790.00 MB
2025-02-15 02:24:25,811 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         35348.40 MB
2025-02-15 02:24:25,825 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1
2025-02-15 02:24:25,825 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 932
2025-02-15 02:24:25,825 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds
2025-02-15 02:24:25,825 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:25,825 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 31369.85 MB
2025-02-15 02:24:25,825 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  33259.38 MB
2025-02-15 02:24:25,825 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  1889.53 MB
2025-02-15 02:24:25,825 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  36385.59 MB
2025-02-15 02:24:25,825 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   36387.68 MB
2025-02-15 02:24:25,825 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  2.10 MB
2025-02-15 02:24:25,825 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         34676.81 MB
2025-02-15 02:24:26,038 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group
2025-02-15 02:24:26,038 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 950
2025-02-15 02:24:26,038 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.21 seconds
2025-02-15 02:24:26,038 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:26,038 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33259.38 MB
2025-02-15 02:24:26,038 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  35501.24 MB
2025-02-15 02:24:26,038 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  2241.86 MB
2025-02-15 02:24:26,038 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  36387.68 MB
2025-02-15 02:24:26,038 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   42523.95 MB
2025-02-15 02:24:26,038 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  6136.27 MB
2025-02-15 02:24:26,038 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         41045.52 MB
2025-02-15 02:24:26,039 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA
2025-02-15 02:24:26,039 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 927
2025-02-15 02:24:26,039 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.23 seconds
2025-02-15 02:24:26,039 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:26,039 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 31369.85 MB
2025-02-15 02:24:26,039 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  35501.24 MB
2025-02-15 02:24:26,039 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  4131.39 MB
2025-02-15 02:24:26,039 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  36385.59 MB
2025-02-15 02:24:26,039 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   42523.95 MB
2025-02-15 02:24:26,039 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  6138.36 MB
2025-02-15 02:24:26,039 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         41045.52 MB
2025-02-15 02:24:26,197 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding
2025-02-15 02:24:26,197 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1093
2025-02-15 02:24:26,197 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.15 seconds
2025-02-15 02:24:26,197 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:26,197 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 37034.78 MB
2025-02-15 02:24:26,197 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37801.78 MB
2025-02-15 02:24:26,197 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  767.00 MB
2025-02-15 02:24:26,197 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  42523.95 MB
2025-02-15 02:24:26,197 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   42941.28 MB
2025-02-15 02:24:26,197 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  417.33 MB
2025-02-15 02:24:26,197 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         38509.57 MB
2025-02-15 02:24:26,222 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC
2025-02-15 02:24:26,222 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1394
2025-02-15 02:24:26,222 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:24:26,222 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:26,222 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 38214.67 MB
2025-02-15 02:24:26,222 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  38443.78 MB
2025-02-15 02:24:26,222 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  229.11 MB
2025-02-15 02:24:26,222 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  42941.28 MB
2025-02-15 02:24:26,222 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   42941.28 MB
2025-02-15 02:24:26,222 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:24:26,222 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         38643.00 MB
2025-02-15 02:24:26,224 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal
2025-02-15 02:24:26,224 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309
2025-02-15 02:24:26,224 - resource_logging.py:150 - __exit__ - DEBUG - Time: 52.23 seconds
2025-02-15 02:24:26,224 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:26,224 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 22481.08 MB
2025-02-15 02:24:26,224 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  38644.80 MB
2025-02-15 02:24:26,224 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  16163.72 MB
2025-02-15 02:24:26,224 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  23683.14 MB
2025-02-15 02:24:26,224 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   42941.28 MB
2025-02-15 02:24:26,224 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  19258.15 MB
2025-02-15 02:24:26,224 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         38644.80 MB
2025-02-15 02:24:26,258 - logging.py:328 - warning_once - WARNING - The attention layers in this model are transitioning from computing the RoPE embeddings internally through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed `position_embeddings` (Tuple of tensors, containing cos and sin). In v4.45 `position_ids` will be removed and `position_embeddings` will be mandatory.
2025-02-15 02:24:26,525 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward
2025-02-15 02:24:26,525 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390
2025-02-15 02:24:26,525 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.30 seconds
2025-02-15 02:24:26,525 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:26,525 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 24504.97 MB
2025-02-15 02:24:26,525 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  27518.26 MB
2025-02-15 02:24:26,525 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  3013.30 MB
2025-02-15 02:24:26,525 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  42941.28 MB
2025-02-15 02:24:26,525 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   42941.28 MB
2025-02-15 02:24:26,525 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:24:26,525 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         27819.56 MB
2025-02-15 02:24:26,543 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8160, cut from 8162
2025-02-15 02:24:26,546 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 1 (']
2025-02-15 02:24:26,554 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits
2025-02-15 02:24:26,554 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456
2025-02-15 02:24:26,554 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.03 seconds
2025-02-15 02:24:26,554 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:24:26,554 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 27518.26 MB
2025-02-15 02:24:26,554 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  35955.73 MB
2025-02-15 02:24:26,554 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  8437.47 MB
2025-02-15 02:24:26,554 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  42941.28 MB
2025-02-15 02:24:26,554 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   47135.59 MB
2025-02-15 02:24:26,554 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  4194.30 MB
2025-02-15 02:24:26,554 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         35955.73 MB
2025-02-15 02:24:26,711 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7952]
2025-02-15 02:24:26,713 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:24:26,713 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0]
2025-02-15 02:24:26,714 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:24:26,714 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0]
2025-02-15 02:24:26,719 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237]
2025-02-15 02:24:26,721 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:24:26,721 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0]
2025-02-15 02:24:26,721 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 1 (']
2025-02-15 02:26:44,938 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:26:44,939 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0]
2025-02-15 02:26:44,944 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224
2025-02-15 02:26:44,948 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:26:44,948 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 3309, 3, 384, 384]), torch.float32, cuda:0]
2025-02-15 02:26:44,949 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:26:44,949 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 3309, 3, 378, 378]), torch.float32, cuda:0]
2025-02-15 02:27:35,746 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino
2025-02-15 02:27:35,746 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 870
2025-02-15 02:27:35,746 - resource_logging.py:150 - __exit__ - DEBUG - Time: 50.78 seconds
2025-02-15 02:27:35,746 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:35,746 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 36026.48 MB
2025-02-15 02:27:35,746 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  47736.98 MB
2025-02-15 02:27:35,746 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  11710.50 MB
2025-02-15 02:27:35,746 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  72909.59 MB
2025-02-15 02:27:35,746 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   55165.58 MB
2025-02-15 02:27:35,746 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -17744.00 MB
2025-02-15 02:27:35,746 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         59447.34 MB
2025-02-15 02:27:36,072 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame
2025-02-15 02:27:36,072 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 876
2025-02-15 02:27:36,072 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.32 seconds
2025-02-15 02:27:36,072 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:36,072 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 47736.98 MB
2025-02-15 02:27:36,072 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32980.37 MB
2025-02-15 02:27:36,072 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -14756.60 MB
2025-02-15 02:27:36,072 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  55165.58 MB
2025-02-15 02:27:36,072 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   77514.93 MB
2025-02-15 02:27:36,072 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  22349.35 MB
2025-02-15 02:27:36,072 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         81351.29 MB
2025-02-15 02:27:38,010 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip
2025-02-15 02:27:38,011 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 891
2025-02-15 02:27:38,011 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.94 seconds
2025-02-15 02:27:38,011 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,011 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32980.37 MB
2025-02-15 02:27:38,011 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  33511.22 MB
2025-02-15 02:27:38,011 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  530.84 MB
2025-02-15 02:27:38,011 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  77514.93 MB
2025-02-15 02:27:38,011 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   37599.84 MB
2025-02-15 02:27:38,011 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -39915.09 MB
2025-02-15 02:27:38,011 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37490.80 MB
2025-02-15 02:27:38,024 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1
2025-02-15 02:27:38,024 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 932
2025-02-15 02:27:38,024 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds
2025-02-15 02:27:38,024 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,024 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33511.22 MB
2025-02-15 02:27:38,024 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  35400.75 MB
2025-02-15 02:27:38,025 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  1889.53 MB
2025-02-15 02:27:38,025 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37599.84 MB
2025-02-15 02:27:38,025 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   38543.56 MB
2025-02-15 02:27:38,025 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  943.72 MB
2025-02-15 02:27:38,025 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         36818.18 MB
2025-02-15 02:27:38,233 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group
2025-02-15 02:27:38,233 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 950
2025-02-15 02:27:38,233 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.21 seconds
2025-02-15 02:27:38,233 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,233 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 35400.75 MB
2025-02-15 02:27:38,233 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37642.61 MB
2025-02-15 02:27:38,233 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  2241.86 MB
2025-02-15 02:27:38,233 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  38543.56 MB
2025-02-15 02:27:38,233 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45149.59 MB
2025-02-15 02:27:38,233 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  6606.03 MB
2025-02-15 02:27:38,233 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43186.89 MB
2025-02-15 02:27:38,234 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA
2025-02-15 02:27:38,234 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 927
2025-02-15 02:27:38,234 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.22 seconds
2025-02-15 02:27:38,234 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,234 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33511.22 MB
2025-02-15 02:27:38,234 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37642.61 MB
2025-02-15 02:27:38,234 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  4131.39 MB
2025-02-15 02:27:38,234 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37599.84 MB
2025-02-15 02:27:38,234 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45149.59 MB
2025-02-15 02:27:38,234 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  7549.75 MB
2025-02-15 02:27:38,234 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43186.89 MB
2025-02-15 02:27:38,394 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding
2025-02-15 02:27:38,394 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1093
2025-02-15 02:27:38,394 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.15 seconds
2025-02-15 02:27:38,394 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,394 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 39176.15 MB
2025-02-15 02:27:38,394 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  39943.15 MB
2025-02-15 02:27:38,394 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  767.00 MB
2025-02-15 02:27:38,394 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45149.59 MB
2025-02-15 02:27:38,394 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45562.72 MB
2025-02-15 02:27:38,394 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  413.14 MB
2025-02-15 02:27:38,394 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40650.94 MB
2025-02-15 02:27:38,412 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC
2025-02-15 02:27:38,412 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1394
2025-02-15 02:27:38,412 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:27:38,412 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,412 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40356.04 MB
2025-02-15 02:27:38,412 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40584.02 MB
2025-02-15 02:27:38,412 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  227.98 MB
2025-02-15 02:27:38,412 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45562.72 MB
2025-02-15 02:27:38,412 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45562.72 MB
2025-02-15 02:27:38,412 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:27:38,412 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40805.06 MB
2025-02-15 02:27:38,413 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal
2025-02-15 02:27:38,413 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309
2025-02-15 02:27:38,413 - resource_logging.py:150 - __exit__ - DEBUG - Time: 53.46 seconds
2025-02-15 02:27:38,413 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,413 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 24497.59 MB
2025-02-15 02:27:38,413 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40783.91 MB
2025-02-15 02:27:38,413 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  16286.32 MB
2025-02-15 02:27:38,413 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  67054.34 MB
2025-02-15 02:27:38,413 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45562.72 MB
2025-02-15 02:27:38,413 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -21491.61 MB
2025-02-15 02:27:38,413 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40805.06 MB
2025-02-15 02:27:38,680 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward
2025-02-15 02:27:38,680 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390
2025-02-15 02:27:38,680 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds
2025-02-15 02:27:38,680 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,680 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40783.91 MB
2025-02-15 02:27:38,680 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  29483.70 MB
2025-02-15 02:27:38,680 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -11300.21 MB
2025-02-15 02:27:38,680 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45562.72 MB
2025-02-15 02:27:38,680 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45562.72 MB
2025-02-15 02:27:38,680 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:27:38,680 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42581.69 MB
2025-02-15 02:27:38,698 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8114, cut from 8116
2025-02-15 02:27:38,698 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 2 (']
2025-02-15 02:27:38,704 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits
2025-02-15 02:27:38,704 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456
2025-02-15 02:27:38,704 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:27:38,704 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:27:38,704 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 29483.70 MB
2025-02-15 02:27:38,704 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37872.84 MB
2025-02-15 02:27:38,704 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  8389.15 MB
2025-02-15 02:27:38,704 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45562.72 MB
2025-02-15 02:27:38,704 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   49733.96 MB
2025-02-15 02:27:38,704 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  4171.24 MB
2025-02-15 02:27:38,704 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37872.84 MB
2025-02-15 02:27:38,860 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7906]
2025-02-15 02:27:38,862 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:27:38,862 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0]
2025-02-15 02:27:38,863 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:27:38,863 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0]
2025-02-15 02:27:38,867 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237]
2025-02-15 02:27:38,868 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:27:38,868 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0]
2025-02-15 02:27:38,869 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 2 (']
2025-02-15 02:30:06,186 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:30:06,187 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0]
2025-02-15 02:30:06,192 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224
2025-02-15 02:30:06,196 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:30:06,196 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 3227, 3, 384, 384]), torch.float32, cuda:0]
2025-02-15 02:30:06,197 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:30:06,197 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 3227, 3, 378, 378]), torch.float32, cuda:0]
2025-02-15 02:30:55,739 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino
2025-02-15 02:30:55,739 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 870
2025-02-15 02:30:55,739 - resource_logging.py:150 - __exit__ - DEBUG - Time: 49.53 seconds
2025-02-15 02:30:55,739 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:55,739 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 35455.42 MB
2025-02-15 02:30:55,739 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  46876.51 MB
2025-02-15 02:30:55,739 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  11421.09 MB
2025-02-15 02:30:55,739 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  69321.36 MB
2025-02-15 02:30:55,739 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   54572.09 MB
2025-02-15 02:30:55,739 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -14749.27 MB
2025-02-15 02:30:55,739 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         58296.68 MB
2025-02-15 02:30:56,021 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame
2025-02-15 02:30:56,021 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 876
2025-02-15 02:30:56,021 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.28 seconds
2025-02-15 02:30:56,021 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:56,021 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 46876.51 MB
2025-02-15 02:30:56,021 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32554.48 MB
2025-02-15 02:30:56,021 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -14322.03 MB
2025-02-15 02:30:56,021 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  54572.09 MB
2025-02-15 02:30:56,021 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   76980.16 MB
2025-02-15 02:30:56,021 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  22408.07 MB
2025-02-15 02:30:56,021 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         80572.30 MB
2025-02-15 02:30:57,950 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip
2025-02-15 02:30:57,950 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 891
2025-02-15 02:30:57,950 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.93 seconds
2025-02-15 02:30:57,950 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:57,950 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32554.48 MB
2025-02-15 02:30:57,950 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  33085.32 MB
2025-02-15 02:30:57,950 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  530.84 MB
2025-02-15 02:30:57,950 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  76980.16 MB
2025-02-15 02:30:57,950 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   37295.75 MB
2025-02-15 02:30:57,950 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -39684.41 MB
2025-02-15 02:30:57,950 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37064.16 MB
2025-02-15 02:30:57,963 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1
2025-02-15 02:30:57,964 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 932
2025-02-15 02:30:57,964 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds
2025-02-15 02:30:57,964 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:57,964 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33085.32 MB
2025-02-15 02:30:57,964 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  34974.85 MB
2025-02-15 02:30:57,964 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  1889.53 MB
2025-02-15 02:30:57,964 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37295.75 MB
2025-02-15 02:30:57,964 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   38239.47 MB
2025-02-15 02:30:57,964 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  943.72 MB
2025-02-15 02:30:57,964 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         36392.28 MB
2025-02-15 02:30:58,170 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group
2025-02-15 02:30:58,170 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 950
2025-02-15 02:30:58,170 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.20 seconds
2025-02-15 02:30:58,170 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:58,170 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 34974.85 MB
2025-02-15 02:30:58,170 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37216.71 MB
2025-02-15 02:30:58,170 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  2241.86 MB
2025-02-15 02:30:58,170 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  38239.47 MB
2025-02-15 02:30:58,170 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44845.50 MB
2025-02-15 02:30:58,170 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  6606.03 MB
2025-02-15 02:30:58,170 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42760.99 MB
2025-02-15 02:30:58,171 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA
2025-02-15 02:30:58,171 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 927
2025-02-15 02:30:58,171 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.22 seconds
2025-02-15 02:30:58,171 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:58,171 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33085.32 MB
2025-02-15 02:30:58,171 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37216.71 MB
2025-02-15 02:30:58,171 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  4131.39 MB
2025-02-15 02:30:58,171 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37295.75 MB
2025-02-15 02:30:58,171 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44845.50 MB
2025-02-15 02:30:58,171 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  7549.75 MB
2025-02-15 02:30:58,171 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42760.99 MB
2025-02-15 02:30:58,331 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding
2025-02-15 02:30:58,331 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1093
2025-02-15 02:30:58,331 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.15 seconds
2025-02-15 02:30:58,331 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:58,331 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 38750.25 MB
2025-02-15 02:30:58,331 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  39517.25 MB
2025-02-15 02:30:58,331 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  767.00 MB
2025-02-15 02:30:58,331 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44845.50 MB
2025-02-15 02:30:58,331 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45260.73 MB
2025-02-15 02:30:58,331 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  415.24 MB
2025-02-15 02:30:58,331 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40225.04 MB
2025-02-15 02:30:58,350 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC
2025-02-15 02:30:58,350 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1394
2025-02-15 02:30:58,350 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:30:58,350 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:58,350 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 39930.14 MB
2025-02-15 02:30:58,350 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40159.43 MB
2025-02-15 02:30:58,350 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  229.29 MB
2025-02-15 02:30:58,350 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45260.73 MB
2025-02-15 02:30:58,350 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45260.73 MB
2025-02-15 02:30:58,350 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:30:58,350 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40398.03 MB
2025-02-15 02:30:58,351 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal
2025-02-15 02:30:58,351 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309
2025-02-15 02:30:58,351 - resource_logging.py:150 - __exit__ - DEBUG - Time: 52.15 seconds
2025-02-15 02:30:58,351 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:58,351 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 24212.29 MB
2025-02-15 02:30:58,351 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40360.33 MB
2025-02-15 02:30:58,351 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  16148.04 MB
2025-02-15 02:30:58,351 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  69321.36 MB
2025-02-15 02:30:58,351 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45260.73 MB
2025-02-15 02:30:58,351 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -24060.62 MB
2025-02-15 02:30:58,351 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40398.03 MB
2025-02-15 02:30:58,619 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward
2025-02-15 02:30:58,619 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390
2025-02-15 02:30:58,619 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds
2025-02-15 02:30:58,619 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:58,619 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40360.33 MB
2025-02-15 02:30:58,619 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  29214.01 MB
2025-02-15 02:30:58,620 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -11146.32 MB
2025-02-15 02:30:58,620 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45260.73 MB
2025-02-15 02:30:58,620 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45260.73 MB
2025-02-15 02:30:58,620 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:30:58,620 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42869.85 MB
2025-02-15 02:30:58,637 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8155, cut from 8157
2025-02-15 02:30:58,637 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 2 (']
2025-02-15 02:30:58,643 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits
2025-02-15 02:30:58,643 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456
2025-02-15 02:30:58,643 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:30:58,643 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:30:58,644 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 29214.01 MB
2025-02-15 02:30:58,644 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37645.48 MB
2025-02-15 02:30:58,644 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  8431.46 MB
2025-02-15 02:30:58,644 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45260.73 MB
2025-02-15 02:30:58,644 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   49452.94 MB
2025-02-15 02:30:58,644 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  4192.21 MB
2025-02-15 02:30:58,644 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37645.48 MB
2025-02-15 02:30:58,802 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7947]
2025-02-15 02:30:58,803 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:30:58,803 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0]
2025-02-15 02:30:58,804 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:30:58,804 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0]
2025-02-15 02:30:58,809 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237]
2025-02-15 02:30:58,810 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:30:58,810 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0]
2025-02-15 02:30:58,810 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 2 (']
2025-02-15 02:33:18,756 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:33:18,756 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0]
2025-02-15 02:33:18,762 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224
2025-02-15 02:33:18,766 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:33:18,766 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 3297, 3, 384, 384]), torch.float32, cuda:0]
2025-02-15 02:33:18,767 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:33:18,767 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 3297, 3, 378, 378]), torch.float32, cuda:0]
2025-02-15 02:34:09,525 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino
2025-02-15 02:34:09,526 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 870
2025-02-15 02:34:09,526 - resource_logging.py:150 - __exit__ - DEBUG - Time: 50.75 seconds
2025-02-15 02:34:09,526 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:09,526 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 35944.24 MB
2025-02-15 02:34:09,526 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  47612.80 MB
2025-02-15 02:34:09,526 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  11668.55 MB
2025-02-15 02:34:09,526 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  75159.83 MB
2025-02-15 02:34:09,526 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   55039.75 MB
2025-02-15 02:34:09,526 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -20120.08 MB
2025-02-15 02:34:09,526 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         59280.70 MB
2025-02-15 02:34:09,816 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame
2025-02-15 02:34:09,817 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 876
2025-02-15 02:34:09,817 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.29 seconds
2025-02-15 02:34:09,817 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:09,817 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 47612.80 MB
2025-02-15 02:34:09,817 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32919.11 MB
2025-02-15 02:34:09,817 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -14693.69 MB
2025-02-15 02:34:09,817 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  55039.75 MB
2025-02-15 02:34:09,817 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   77370.23 MB
2025-02-15 02:34:09,817 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  22330.47 MB
2025-02-15 02:34:09,817 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         81192.16 MB
2025-02-15 02:34:11,747 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip
2025-02-15 02:34:11,747 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 891
2025-02-15 02:34:11,747 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.93 seconds
2025-02-15 02:34:11,747 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:11,747 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32919.11 MB
2025-02-15 02:34:11,747 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  33449.95 MB
2025-02-15 02:34:11,747 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  530.84 MB
2025-02-15 02:34:11,747 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  77370.23 MB
2025-02-15 02:34:11,747 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   37536.92 MB
2025-02-15 02:34:11,747 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -39833.31 MB
2025-02-15 02:34:11,747 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37429.54 MB
2025-02-15 02:34:11,761 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1
2025-02-15 02:34:11,761 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 932
2025-02-15 02:34:11,761 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds
2025-02-15 02:34:11,761 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:11,761 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33449.95 MB
2025-02-15 02:34:11,761 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  35339.49 MB
2025-02-15 02:34:11,761 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  1889.53 MB
2025-02-15 02:34:11,761 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37536.92 MB
2025-02-15 02:34:11,761 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   38480.64 MB
2025-02-15 02:34:11,761 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  943.72 MB
2025-02-15 02:34:11,761 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         36756.92 MB
2025-02-15 02:34:11,967 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group
2025-02-15 02:34:11,967 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 950
2025-02-15 02:34:11,967 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.20 seconds
2025-02-15 02:34:11,967 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:11,967 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 35339.49 MB
2025-02-15 02:34:11,967 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37581.34 MB
2025-02-15 02:34:11,967 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  2241.86 MB
2025-02-15 02:34:11,967 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  38480.64 MB
2025-02-15 02:34:11,967 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45086.67 MB
2025-02-15 02:34:11,967 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  6606.03 MB
2025-02-15 02:34:11,967 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43125.62 MB
2025-02-15 02:34:11,968 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA
2025-02-15 02:34:11,968 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 927
2025-02-15 02:34:11,968 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.22 seconds
2025-02-15 02:34:11,968 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:11,968 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33449.95 MB
2025-02-15 02:34:11,968 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37581.34 MB
2025-02-15 02:34:11,968 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  4131.39 MB
2025-02-15 02:34:11,968 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37536.92 MB
2025-02-15 02:34:11,968 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45086.67 MB
2025-02-15 02:34:11,968 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  7549.75 MB
2025-02-15 02:34:11,968 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43125.62 MB
2025-02-15 02:34:12,129 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding
2025-02-15 02:34:12,129 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1093
2025-02-15 02:34:12,129 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.16 seconds
2025-02-15 02:34:12,129 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:12,129 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 39114.89 MB
2025-02-15 02:34:12,129 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  39881.89 MB
2025-02-15 02:34:12,129 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  767.00 MB
2025-02-15 02:34:12,129 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45086.67 MB
2025-02-15 02:34:12,129 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45504.00 MB
2025-02-15 02:34:12,129 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  417.33 MB
2025-02-15 02:34:12,129 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40589.68 MB
2025-02-15 02:34:12,147 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC
2025-02-15 02:34:12,147 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1394
2025-02-15 02:34:12,147 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:34:12,147 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:12,147 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40294.78 MB
2025-02-15 02:34:12,147 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40523.16 MB
2025-02-15 02:34:12,147 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  228.38 MB
2025-02-15 02:34:12,147 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45504.00 MB
2025-02-15 02:34:12,147 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45504.00 MB
2025-02-15 02:34:12,147 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:34:12,148 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40738.01 MB
2025-02-15 02:34:12,149 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal
2025-02-15 02:34:12,149 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309
2025-02-15 02:34:12,149 - resource_logging.py:150 - __exit__ - DEBUG - Time: 53.38 seconds
2025-02-15 02:34:12,149 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:12,149 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 24456.90 MB
2025-02-15 02:34:12,149 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40723.45 MB
2025-02-15 02:34:12,149 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  16266.54 MB
2025-02-15 02:34:12,149 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  69325.55 MB
2025-02-15 02:34:12,149 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45504.00 MB
2025-02-15 02:34:12,149 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -23821.55 MB
2025-02-15 02:34:12,149 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40738.01 MB
2025-02-15 02:34:12,417 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward
2025-02-15 02:34:12,417 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390
2025-02-15 02:34:12,417 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds
2025-02-15 02:34:12,417 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:12,417 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40723.45 MB
2025-02-15 02:34:12,417 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  29449.10 MB
2025-02-15 02:34:12,417 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -11274.34 MB
2025-02-15 02:34:12,417 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45504.00 MB
2025-02-15 02:34:12,417 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45504.00 MB
2025-02-15 02:34:12,417 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:34:12,417 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43225.28 MB
2025-02-15 02:34:12,435 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8130, cut from 8132
2025-02-15 02:34:12,435 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 1 (']
2025-02-15 02:34:12,441 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits
2025-02-15 02:34:12,441 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456
2025-02-15 02:34:12,441 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:34:12,441 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:34:12,441 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 29449.10 MB
2025-02-15 02:34:12,441 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37854.76 MB
2025-02-15 02:34:12,441 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  8405.66 MB
2025-02-15 02:34:12,441 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45504.00 MB
2025-02-15 02:34:12,441 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   49683.63 MB
2025-02-15 02:34:12,441 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  4179.62 MB
2025-02-15 02:34:12,441 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37854.76 MB
2025-02-15 02:34:12,600 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7922]
2025-02-15 02:34:12,601 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:34:12,601 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0]
2025-02-15 02:34:12,602 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:34:12,602 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0]
2025-02-15 02:34:12,607 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237]
2025-02-15 02:34:12,608 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:34:12,608 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0]
2025-02-15 02:34:12,608 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 1 (']
2025-02-15 02:36:11,622 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:36:11,623 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0]
2025-02-15 02:36:11,628 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224
2025-02-15 02:36:11,633 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:36:11,633 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 3305, 3, 384, 384]), torch.float32, cuda:0]
2025-02-15 02:36:11,634 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:36:11,634 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 3305, 3, 378, 378]), torch.float32, cuda:0]
2025-02-15 02:37:02,483 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino
2025-02-15 02:37:02,484 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 870
2025-02-15 02:37:02,484 - resource_logging.py:150 - __exit__ - DEBUG - Time: 50.82 seconds
2025-02-15 02:37:02,484 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:02,484 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 36000.18 MB
2025-02-15 02:37:02,484 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  47696.39 MB
2025-02-15 02:37:02,484 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  11696.21 MB
2025-02-15 02:37:02,484 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  75409.39 MB
2025-02-15 02:37:02,484 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   55113.15 MB
2025-02-15 02:37:02,484 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -20296.24 MB
2025-02-15 02:37:02,484 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         59392.60 MB
2025-02-15 02:37:02,777 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame
2025-02-15 02:37:02,777 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 876
2025-02-15 02:37:02,777 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.29 seconds
2025-02-15 02:37:02,777 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:02,777 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 47696.39 MB
2025-02-15 02:37:02,777 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32960.37 MB
2025-02-15 02:37:02,777 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -14736.02 MB
2025-02-15 02:37:02,777 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  55113.15 MB
2025-02-15 02:37:02,777 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   77905.00 MB
2025-02-15 02:37:02,777 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  22791.85 MB
2025-02-15 02:37:02,777 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         81971.19 MB
2025-02-15 02:37:04,709 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip
2025-02-15 02:37:04,709 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 891
2025-02-15 02:37:04,709 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.93 seconds
2025-02-15 02:37:04,709 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:04,709 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32960.37 MB
2025-02-15 02:37:04,709 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  33491.21 MB
2025-02-15 02:37:04,709 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  530.84 MB
2025-02-15 02:37:04,709 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  77905.00 MB
2025-02-15 02:37:04,709 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   37566.28 MB
2025-02-15 02:37:04,709 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -40338.72 MB
2025-02-15 02:37:04,709 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37470.79 MB
2025-02-15 02:37:04,723 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1
2025-02-15 02:37:04,723 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 932
2025-02-15 02:37:04,723 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds
2025-02-15 02:37:04,723 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:04,723 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33491.21 MB
2025-02-15 02:37:04,723 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  35380.47 MB
2025-02-15 02:37:04,723 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  1889.26 MB
2025-02-15 02:37:04,723 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37566.28 MB
2025-02-15 02:37:04,723 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   38510.00 MB
2025-02-15 02:37:04,723 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  943.72 MB
2025-02-15 02:37:04,723 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         36797.90 MB
2025-02-15 02:37:04,930 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group
2025-02-15 02:37:04,930 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 950
2025-02-15 02:37:04,930 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.21 seconds
2025-02-15 02:37:04,930 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:04,930 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 35380.47 MB
2025-02-15 02:37:04,930 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37622.32 MB
2025-02-15 02:37:04,930 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  2241.86 MB
2025-02-15 02:37:04,930 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  38510.00 MB
2025-02-15 02:37:04,930 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45116.03 MB
2025-02-15 02:37:04,930 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  6606.03 MB
2025-02-15 02:37:04,930 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43166.60 MB
2025-02-15 02:37:04,931 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA
2025-02-15 02:37:04,931 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 927
2025-02-15 02:37:04,931 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.22 seconds
2025-02-15 02:37:04,931 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:04,931 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 33491.21 MB
2025-02-15 02:37:04,931 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37622.32 MB
2025-02-15 02:37:04,931 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  4131.12 MB
2025-02-15 02:37:04,931 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37566.28 MB
2025-02-15 02:37:04,931 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45116.03 MB
2025-02-15 02:37:04,931 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  7549.75 MB
2025-02-15 02:37:04,931 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43166.60 MB
2025-02-15 02:37:05,094 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding
2025-02-15 02:37:05,094 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1093
2025-02-15 02:37:05,094 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.16 seconds
2025-02-15 02:37:05,094 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:05,094 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 39155.87 MB
2025-02-15 02:37:05,094 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  39922.87 MB
2025-02-15 02:37:05,094 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  767.00 MB
2025-02-15 02:37:05,094 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45116.03 MB
2025-02-15 02:37:05,094 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45533.36 MB
2025-02-15 02:37:05,094 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  417.33 MB
2025-02-15 02:37:05,094 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40630.66 MB
2025-02-15 02:37:05,112 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC
2025-02-15 02:37:05,112 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1394
2025-02-15 02:37:05,112 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:37:05,112 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:05,112 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40335.76 MB
2025-02-15 02:37:05,112 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40564.52 MB
2025-02-15 02:37:05,112 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  228.76 MB
2025-02-15 02:37:05,113 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45533.36 MB
2025-02-15 02:37:05,113 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45533.36 MB
2025-02-15 02:37:05,113 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:37:05,113 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40783.44 MB
2025-02-15 02:37:05,114 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal
2025-02-15 02:37:05,114 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309
2025-02-15 02:37:05,114 - resource_logging.py:150 - __exit__ - DEBUG - Time: 53.48 seconds
2025-02-15 02:37:05,114 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:05,114 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 24484.44 MB
2025-02-15 02:37:05,114 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40765.25 MB
2025-02-15 02:37:05,114 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  16280.81 MB
2025-02-15 02:37:05,114 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  69560.43 MB
2025-02-15 02:37:05,114 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45533.36 MB
2025-02-15 02:37:05,114 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -24027.07 MB
2025-02-15 02:37:05,114 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40783.44 MB
2025-02-15 02:37:05,384 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward
2025-02-15 02:37:05,384 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390
2025-02-15 02:37:05,384 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds
2025-02-15 02:37:05,384 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:05,384 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40765.25 MB
2025-02-15 02:37:05,384 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  29483.50 MB
2025-02-15 02:37:05,384 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -11281.75 MB
2025-02-15 02:37:05,384 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45533.36 MB
2025-02-15 02:37:05,384 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   45533.36 MB
2025-02-15 02:37:05,384 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:37:05,384 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         43272.62 MB
2025-02-15 02:37:05,402 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8148, cut from 8150
2025-02-15 02:37:05,402 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 2 (']
2025-02-15 02:37:05,408 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits
2025-02-15 02:37:05,408 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456
2025-02-15 02:37:05,408 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:37:05,408 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:37:05,408 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 29483.50 MB
2025-02-15 02:37:05,408 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37907.74 MB
2025-02-15 02:37:05,408 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  8424.24 MB
2025-02-15 02:37:05,408 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  45533.36 MB
2025-02-15 02:37:05,408 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   49721.38 MB
2025-02-15 02:37:05,408 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  4188.01 MB
2025-02-15 02:37:05,408 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37907.74 MB
2025-02-15 02:37:05,567 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7940]
2025-02-15 02:37:05,569 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:37:05,569 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0]
2025-02-15 02:37:05,570 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:37:05,570 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0]
2025-02-15 02:37:05,574 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237]
2025-02-15 02:37:05,575 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:37:05,575 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0]
2025-02-15 02:37:05,575 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 2 (']
2025-02-15 02:39:19,333 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:39:19,333 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0]
2025-02-15 02:39:19,339 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224
2025-02-15 02:39:19,343 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:39:19,343 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 3208, 3, 384, 384]), torch.float32, cuda:0]
2025-02-15 02:39:19,344 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:39:19,344 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 3208, 3, 378, 378]), torch.float32, cuda:0]
2025-02-15 02:40:08,690 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino
2025-02-15 02:40:08,690 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 870
2025-02-15 02:40:08,690 - resource_logging.py:150 - __exit__ - DEBUG - Time: 49.33 seconds
2025-02-15 02:40:08,690 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:08,690 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 35323.45 MB
2025-02-15 02:40:08,690 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  46677.44 MB
2025-02-15 02:40:08,690 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  11353.98 MB
2025-02-15 02:40:08,690 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  69275.22 MB
2025-02-15 02:40:08,690 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   54425.29 MB
2025-02-15 02:40:08,690 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -14849.93 MB
2025-02-15 02:40:08,690 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         58030.37 MB
2025-02-15 02:40:08,802 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame
2025-02-15 02:40:08,802 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 876
2025-02-15 02:40:08,802 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.11 seconds
2025-02-15 02:40:08,802 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:08,802 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 46677.44 MB
2025-02-15 02:40:08,802 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32456.13 MB
2025-02-15 02:40:08,802 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -14221.30 MB
2025-02-15 02:40:08,802 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  54425.29 MB
2025-02-15 02:40:08,802 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   57992.54 MB
2025-02-15 02:40:08,802 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  3567.26 MB
2025-02-15 02:40:08,802 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         55172.22 MB
2025-02-15 02:40:10,729 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip
2025-02-15 02:40:10,729 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 891
2025-02-15 02:40:10,729 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.93 seconds
2025-02-15 02:40:10,729 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:10,729 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32456.13 MB
2025-02-15 02:40:10,729 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32986.97 MB
2025-02-15 02:40:10,729 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  530.84 MB
2025-02-15 02:40:10,729 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  57992.54 MB
2025-02-15 02:40:10,729 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   38637.93 MB
2025-02-15 02:40:10,729 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -19354.62 MB
2025-02-15 02:40:10,729 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         36965.52 MB
2025-02-15 02:40:10,742 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1
2025-02-15 02:40:10,742 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 932
2025-02-15 02:40:10,742 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds
2025-02-15 02:40:10,742 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:10,742 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32986.97 MB
2025-02-15 02:40:10,742 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  34876.51 MB
2025-02-15 02:40:10,742 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  1889.53 MB
2025-02-15 02:40:10,742 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  38637.93 MB
2025-02-15 02:40:10,742 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   38637.93 MB
2025-02-15 02:40:10,742 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:40:10,742 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         36293.94 MB
2025-02-15 02:40:10,948 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group
2025-02-15 02:40:10,948 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 950
2025-02-15 02:40:10,948 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.20 seconds
2025-02-15 02:40:10,948 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:10,948 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 34876.51 MB
2025-02-15 02:40:10,948 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37118.36 MB
2025-02-15 02:40:10,948 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  2241.86 MB
2025-02-15 02:40:10,948 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  38637.93 MB
2025-02-15 02:40:10,948 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44300.24 MB
2025-02-15 02:40:10,948 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  5662.31 MB
2025-02-15 02:40:10,948 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42662.65 MB
2025-02-15 02:40:10,949 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA
2025-02-15 02:40:10,949 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 927
2025-02-15 02:40:10,949 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.22 seconds
2025-02-15 02:40:10,949 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:10,949 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32986.97 MB
2025-02-15 02:40:10,949 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37118.36 MB
2025-02-15 02:40:10,949 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  4131.39 MB
2025-02-15 02:40:10,949 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  38637.93 MB
2025-02-15 02:40:10,949 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44300.24 MB
2025-02-15 02:40:10,949 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  5662.31 MB
2025-02-15 02:40:10,949 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42662.65 MB
2025-02-15 02:40:11,112 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding
2025-02-15 02:40:11,112 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1093
2025-02-15 02:40:11,112 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.16 seconds
2025-02-15 02:40:11,112 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:11,112 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 38651.91 MB
2025-02-15 02:40:11,112 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  39418.91 MB
2025-02-15 02:40:11,112 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  767.00 MB
2025-02-15 02:40:11,112 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44300.24 MB
2025-02-15 02:40:11,112 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44715.47 MB
2025-02-15 02:40:11,112 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  415.24 MB
2025-02-15 02:40:11,112 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40126.70 MB
2025-02-15 02:40:11,131 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC
2025-02-15 02:40:11,131 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1394
2025-02-15 02:40:11,131 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:40:11,131 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:11,131 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 39831.80 MB
2025-02-15 02:40:11,131 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40059.06 MB
2025-02-15 02:40:11,131 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  227.26 MB
2025-02-15 02:40:11,131 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44715.47 MB
2025-02-15 02:40:11,131 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44715.47 MB
2025-02-15 02:40:11,131 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:40:11,131 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40253.26 MB
2025-02-15 02:40:11,133 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal
2025-02-15 02:40:11,133 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309
2025-02-15 02:40:11,133 - resource_logging.py:150 - __exit__ - DEBUG - Time: 51.79 seconds
2025-02-15 02:40:11,133 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:11,133 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 24146.53 MB
2025-02-15 02:40:11,133 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  40260.13 MB
2025-02-15 02:40:11,133 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  16113.61 MB
2025-02-15 02:40:11,133 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  69275.22 MB
2025-02-15 02:40:11,133 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44715.47 MB
2025-02-15 02:40:11,133 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -24559.75 MB
2025-02-15 02:40:11,133 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         40260.13 MB
2025-02-15 02:40:11,405 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward
2025-02-15 02:40:11,405 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390
2025-02-15 02:40:11,405 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds
2025-02-15 02:40:11,405 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:11,405 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 40260.13 MB
2025-02-15 02:40:11,405 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  29150.92 MB
2025-02-15 02:40:11,405 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -11109.22 MB
2025-02-15 02:40:11,405 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44715.47 MB
2025-02-15 02:40:11,405 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44715.47 MB
2025-02-15 02:40:11,405 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:40:11,405 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42771.80 MB
2025-02-15 02:40:11,423 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8162, cut from 8164
2025-02-15 02:40:11,423 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['The final rate for this video is 2 (']
2025-02-15 02:40:11,429 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits
2025-02-15 02:40:11,429 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456
2025-02-15 02:40:11,429 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:40:11,429 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:40:11,429 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 29150.92 MB
2025-02-15 02:40:11,429 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37589.61 MB
2025-02-15 02:40:11,429 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  8438.69 MB
2025-02-15 02:40:11,429 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44715.47 MB
2025-02-15 02:40:11,429 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44715.47 MB
2025-02-15 02:40:11,429 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:40:11,429 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37589.61 MB
2025-02-15 02:40:11,588 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7954]
2025-02-15 02:40:11,589 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:40:11,589 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0]
2025-02-15 02:40:11,590 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:40:11,590 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0]
2025-02-15 02:40:11,595 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237]
2025-02-15 02:40:11,596 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:40:11,596 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0]
2025-02-15 02:40:11,596 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['The final rate for this video is 2 (']
2025-02-15 02:42:30,171 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:42:30,171 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0]
2025-02-15 02:42:30,179 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224
2025-02-15 02:42:30,186 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:42:30,186 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 3124, 3, 384, 384]), torch.float32, cuda:0]
2025-02-15 02:42:30,188 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:42:30,188 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 3124, 3, 378, 378]), torch.float32, cuda:0]
2025-02-15 02:43:18,433 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino
2025-02-15 02:43:18,433 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 870
2025-02-15 02:43:18,433 - resource_logging.py:150 - __exit__ - DEBUG - Time: 48.23 seconds
2025-02-15 02:43:18,433 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:18,433 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 34737.50 MB
2025-02-15 02:43:18,433 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  45793.69 MB
2025-02-15 02:43:18,433 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  11056.19 MB
2025-02-15 02:43:18,433 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  63990.40 MB
2025-02-15 02:43:18,433 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   53487.86 MB
2025-02-15 02:43:18,433 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -10502.54 MB
2025-02-15 02:43:18,433 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         56849.35 MB
2025-02-15 02:43:18,704 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame
2025-02-15 02:43:18,704 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 876
2025-02-15 02:43:18,704 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds
2025-02-15 02:43:18,704 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:18,704 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 45793.69 MB
2025-02-15 02:43:18,704 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32018.81 MB
2025-02-15 02:43:18,704 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -13774.87 MB
2025-02-15 02:43:18,704 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  53487.86 MB
2025-02-15 02:43:18,704 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   74885.10 MB
2025-02-15 02:43:18,704 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  21397.24 MB
2025-02-15 02:43:18,704 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         77968.67 MB
2025-02-15 02:43:20,694 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip
2025-02-15 02:43:20,694 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 891
2025-02-15 02:43:20,694 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.99 seconds
2025-02-15 02:43:20,694 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:20,694 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32018.81 MB
2025-02-15 02:43:20,695 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  32549.66 MB
2025-02-15 02:43:20,695 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  530.84 MB
2025-02-15 02:43:20,695 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  74885.10 MB
2025-02-15 02:43:20,695 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   36763.07 MB
2025-02-15 02:43:20,695 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -38122.03 MB
2025-02-15 02:43:20,695 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         36528.20 MB
2025-02-15 02:43:20,711 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1
2025-02-15 02:43:20,711 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 932
2025-02-15 02:43:20,711 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds
2025-02-15 02:43:20,711 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:20,712 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32549.66 MB
2025-02-15 02:43:20,712 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  34438.88 MB
2025-02-15 02:43:20,712 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  1889.22 MB
2025-02-15 02:43:20,712 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  36763.07 MB
2025-02-15 02:43:20,712 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   37706.79 MB
2025-02-15 02:43:20,712 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  943.72 MB
2025-02-15 02:43:20,712 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         35856.31 MB
2025-02-15 02:43:20,978 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group
2025-02-15 02:43:20,978 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 950
2025-02-15 02:43:20,978 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.26 seconds
2025-02-15 02:43:20,978 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:20,978 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 34438.88 MB
2025-02-15 02:43:20,978 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  36680.74 MB
2025-02-15 02:43:20,978 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  2241.86 MB
2025-02-15 02:43:20,978 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  37706.79 MB
2025-02-15 02:43:20,978 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44312.82 MB
2025-02-15 02:43:20,978 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  6606.03 MB
2025-02-15 02:43:20,978 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42225.02 MB
2025-02-15 02:43:20,980 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA
2025-02-15 02:43:20,980 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 927
2025-02-15 02:43:20,980 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.28 seconds
2025-02-15 02:43:20,980 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:20,980 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 32549.66 MB
2025-02-15 02:43:20,980 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  36680.74 MB
2025-02-15 02:43:20,980 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  4131.08 MB
2025-02-15 02:43:20,980 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  36763.07 MB
2025-02-15 02:43:20,980 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44312.82 MB
2025-02-15 02:43:20,980 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  7549.75 MB
2025-02-15 02:43:20,980 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42225.02 MB
2025-02-15 02:43:21,265 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding
2025-02-15 02:43:21,265 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1093
2025-02-15 02:43:21,265 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.28 seconds
2025-02-15 02:43:21,265 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:21,265 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 38214.28 MB
2025-02-15 02:43:21,265 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  38981.28 MB
2025-02-15 02:43:21,265 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  767.00 MB
2025-02-15 02:43:21,265 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44312.82 MB
2025-02-15 02:43:21,265 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44728.06 MB
2025-02-15 02:43:21,265 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  415.24 MB
2025-02-15 02:43:21,265 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         39689.07 MB
2025-02-15 02:43:21,294 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC
2025-02-15 02:43:21,295 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1394
2025-02-15 02:43:21,295 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:43:21,295 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:21,295 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 39394.17 MB
2025-02-15 02:43:21,295 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  39623.78 MB
2025-02-15 02:43:21,295 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  229.61 MB
2025-02-15 02:43:21,295 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44728.06 MB
2025-02-15 02:43:21,295 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44728.06 MB
2025-02-15 02:43:21,295 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:43:21,295 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         39864.78 MB
2025-02-15 02:43:21,297 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal
2025-02-15 02:43:21,297 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309
2025-02-15 02:43:21,297 - resource_logging.py:150 - __exit__ - DEBUG - Time: 51.11 seconds
2025-02-15 02:43:21,297 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:21,297 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 23853.23 MB
2025-02-15 02:43:21,297 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  39824.39 MB
2025-02-15 02:43:21,297 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  15971.15 MB
2025-02-15 02:43:21,297 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  63990.40 MB
2025-02-15 02:43:21,297 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44728.06 MB
2025-02-15 02:43:21,297 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  -19262.34 MB
2025-02-15 02:43:21,297 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         39864.78 MB
2025-02-15 02:43:21,582 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward
2025-02-15 02:43:21,582 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390
2025-02-15 02:43:21,582 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.28 seconds
2025-02-15 02:43:21,582 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:21,582 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 39824.39 MB
2025-02-15 02:43:21,582 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  28850.39 MB
2025-02-15 02:43:21,582 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  -10974.00 MB
2025-02-15 02:43:21,582 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44728.06 MB
2025-02-15 02:43:21,582 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   44728.06 MB
2025-02-15 02:43:21,582 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  0.00 MB
2025-02-15 02:43:21,582 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         42330.22 MB
2025-02-15 02:43:21,600 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8143, cut from 8145
2025-02-15 02:43:21,600 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 2,']
2025-02-15 02:43:21,607 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits
2025-02-15 02:43:21,607 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456
2025-02-15 02:43:21,607 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds
2025-02-15 02:43:21,607 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0
2025-02-15 02:43:21,607 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 28850.39 MB
2025-02-15 02:43:21,607 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block:  37269.46 MB
2025-02-15 02:43:21,607 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change:  8419.08 MB
2025-02-15 02:43:21,607 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block:  44728.06 MB
2025-02-15 02:43:21,607 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block:   48913.97 MB
2025-02-15 02:43:21,607 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change:  4185.92 MB
2025-02-15 02:43:21,607 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated:         37269.46 MB
2025-02-15 02:43:21,773 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7935]
2025-02-15 02:43:21,774 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:43:21,774 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0]
2025-02-15 02:43:21,775 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:43:21,775 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0]
2025-02-15 02:43:21,780 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237]
2025-02-15 02:43:21,781 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown
2025-02-15 02:43:21,781 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0]
2025-02-15 02:43:21,781 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 2,']