2025-02-15 02:47:54,153 - training_args.py:2100 - _setup_devices - INFO - PyTorch: setting up devices 2025-02-15 02:47:54,659 - configuration_utils.py:731 - _get_config_dict - INFO - loading configuration file ./checkpoints/longvu_llama3_2/config.json 2025-02-15 02:47:54,662 - configuration_utils.py:800 - from_dict - INFO - Model config CambrianConfig { "_name_or_path": "/tmp/iopath_cache/manifold_cache/tree/users/shenx/finetune/09281004-cambrian_llama3_2_t576_ov", "architectures": [ "CambrianLlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "connect_layer": 2, "connector_depth": 3, "connector_only": true, "dino_threshold": 0.83, "drop_threshold": 0.8, "eos_token_id": [ 128001, 128008, 128009 ], "frame_pos": false, "freeze_mm_mlp_adapter": false, "hidden_act": "silu", "hidden_size": 3072, "highres": true, "highres_connect": false, "image_aspect_ratio": "pad", "image_position": 91, "image_token_len": 144, "initializer_range": 0.02, "intermediate_size": 8192, "is_image_newline": true, "is_st_sampler": false, "lowres_token": 8, "max_position_embeddings": 131072, "mlp_bias": false, "mm_patch_merge_type": "flat", "mm_projector_lr": null, "mm_projector_type": "sva", "mm_use_im_patch_token": false, "mm_use_im_start_end": false, "mm_vision_sampler_lr": null, "mm_vision_select_feature": "patch", "mm_vision_select_layer": -2, "mm_vision_tower_aux_list": [ "siglip/CLIP-ViT-SO400M-14-384", "facebook/dinov2-giant-res378" ], "mm_vision_tower_aux_token_len_list": [ 576, 576 ], "mm_vision_tower_lr": null, "model_type": "cambrian_llama", "num_attention_heads": 24, "num_hidden_layers": 28, "num_key_value_heads": 8, "num_of_vision_sampler_layers": 10, "num_query_group": 1, "pretraining_tp": 1, "query_num_list": [ 144 ], "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 32.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "spmd_debug": null, "spmd_fsdp_sharding": null, "spmd_mesh": null, "start_of_vision_sampler_layers": 0, "stride_of_vision_sampler_layers": 3, "tie_word_embeddings": false, "tokenizer_model_max_length": 8192, "tokenizer_padding_side": "right", "torch_dtype": "float32", "transformers_version": "4.43.1", "tune_mm_mlp_adapter": false, "unfreeze_mm_vision_tower": false, "use_cache": false, "use_mm_proj": true, "vision_hidden_size": 1024, "vision_tower_aux_token_len_list": [ 576, 576 ], "vocab_size": 128256 } 2025-02-15 02:47:54,663 - modeling_utils.py:3618 - from_pretrained - INFO - loading weights file ./checkpoints/longvu_llama3_2/pytorch_model.bin 2025-02-15 02:47:54,698 - configuration_utils.py:1038 - from_dict - INFO - Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": [ 128001, 128008, 128009 ], "use_cache": false } 2025-02-15 02:47:55,217 - configuration_utils.py:733 - _get_config_dict - INFO - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/config.json 2025-02-15 02:47:55,220 - configuration_utils.py:800 - from_dict - INFO - Model config Dinov2Config { "apply_layernorm": true, "architectures": [ "Dinov2Model" ], "attention_probs_dropout_prob": 0.0, "drop_path_rate": 0.0, "hidden_act": "gelu", "hidden_dropout_prob": 0.0, "hidden_size": 1536, "image_size": 518, "initializer_range": 0.02, "layer_norm_eps": 1e-06, "layerscale_value": 1.0, "mlp_ratio": 4, "model_type": "dinov2", "num_attention_heads": 24, "num_channels": 3, "num_hidden_layers": 40, "out_features": [ "stage40" ], "out_indices": [ 40 ], "patch_size": 14, "qkv_bias": true, "reshape_hidden_states": true, "stage_names": [ "stem", "stage1", "stage2", "stage3", "stage4", "stage5", "stage6", "stage7", "stage8", "stage9", "stage10", "stage11", "stage12", "stage13", "stage14", "stage15", "stage16", "stage17", "stage18", "stage19", "stage20", "stage21", "stage22", "stage23", "stage24", "stage25", "stage26", "stage27", "stage28", "stage29", "stage30", "stage31", "stage32", "stage33", "stage34", "stage35", "stage36", "stage37", "stage38", "stage39", "stage40" ], "torch_dtype": "float32", "transformers_version": "4.43.1", "use_swiglu_ffn": true } 2025-02-15 02:47:56,611 - modeling_utils.py:4450 - _load_pretrained_model - INFO - All model checkpoint weights were used when initializing CambrianLlamaForCausalLM. 2025-02-15 02:47:56,611 - modeling_utils.py:4458 - _load_pretrained_model - INFO - All the weights of CambrianLlamaForCausalLM were initialized from the model checkpoint at ./checkpoints/longvu_llama3_2. If your task is similar to the task the model of the checkpoint was trained on, you can already use CambrianLlamaForCausalLM for predictions without further training. 2025-02-15 02:47:56,616 - configuration_utils.py:991 - from_pretrained - INFO - loading configuration file ./checkpoints/longvu_llama3_2/generation_config.json 2025-02-15 02:47:56,617 - configuration_utils.py:1038 - from_dict - INFO - Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": [ 128001, 128008, 128009 ], "temperature": 0.6, "top_p": 0.9 } 2025-02-15 02:47:56,846 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file tokenizer.json 2025-02-15 02:47:56,847 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file added_tokens.json 2025-02-15 02:47:56,847 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file special_tokens_map.json 2025-02-15 02:47:56,847 - tokenization_utils_base.py:2287 - from_pretrained - INFO - loading file tokenizer_config.json 2025-02-15 02:47:57,246 - tokenization_utils_base.py:2533 - _from_pretrained - INFO - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2025-02-15 02:47:57,922 - configuration_utils.py:733 - _get_config_dict - INFO - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3/config.json 2025-02-15 02:47:57,925 - configuration_utils.py:800 - from_dict - INFO - Model config SiglipVisionConfig { "attention_dropout": 0.0, "hidden_act": "gelu_pytorch_tanh", "hidden_size": 1152, "image_size": 384, "intermediate_size": 4304, "layer_norm_eps": 1e-06, "model_type": "siglip_vision_model", "num_attention_heads": 16, "num_channels": 3, "num_hidden_layers": 27, "patch_size": 14, "transformers_version": "4.43.1" } 2025-02-15 02:47:57,925 - modeling_utils.py:3621 - from_pretrained - INFO - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3/model.safetensors 2025-02-15 02:47:58,194 - modeling_utils.py:4440 - _load_pretrained_model - INFO - Some weights of the model checkpoint at google/siglip-so400m-patch14-384 were not used when initializing SiglipVisionModel: ['logit_bias', 'logit_scale', 'text_model.embeddings.position_embedding.weight', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.12.layer_norm1.bias', 'text_model.encoder.layers.12.layer_norm1.weight', 'text_model.encoder.layers.12.layer_norm2.bias', 'text_model.encoder.layers.12.layer_norm2.weight', 'text_model.encoder.layers.12.mlp.fc1.bias', 'text_model.encoder.layers.12.mlp.fc1.weight', 'text_model.encoder.layers.12.mlp.fc2.bias', 'text_model.encoder.layers.12.mlp.fc2.weight', 'text_model.encoder.layers.12.self_attn.k_proj.bias', 'text_model.encoder.layers.12.self_attn.k_proj.weight', 'text_model.encoder.layers.12.self_attn.out_proj.bias', 'text_model.encoder.layers.12.self_attn.out_proj.weight', 'text_model.encoder.layers.12.self_attn.q_proj.bias', 'text_model.encoder.layers.12.self_attn.q_proj.weight', 'text_model.encoder.layers.12.self_attn.v_proj.bias', 'text_model.encoder.layers.12.self_attn.v_proj.weight', 'text_model.encoder.layers.13.layer_norm1.bias', 'text_model.encoder.layers.13.layer_norm1.weight', 'text_model.encoder.layers.13.layer_norm2.bias', 'text_model.encoder.layers.13.layer_norm2.weight', 'text_model.encoder.layers.13.mlp.fc1.bias', 'text_model.encoder.layers.13.mlp.fc1.weight', 'text_model.encoder.layers.13.mlp.fc2.bias', 'text_model.encoder.layers.13.mlp.fc2.weight', 'text_model.encoder.layers.13.self_attn.k_proj.bias', 'text_model.encoder.layers.13.self_attn.k_proj.weight', 'text_model.encoder.layers.13.self_attn.out_proj.bias', 'text_model.encoder.layers.13.self_attn.out_proj.weight', 'text_model.encoder.layers.13.self_attn.q_proj.bias', 'text_model.encoder.layers.13.self_attn.q_proj.weight', 'text_model.encoder.layers.13.self_attn.v_proj.bias', 'text_model.encoder.layers.13.self_attn.v_proj.weight', 'text_model.encoder.layers.14.layer_norm1.bias', 'text_model.encoder.layers.14.layer_norm1.weight', 'text_model.encoder.layers.14.layer_norm2.bias', 'text_model.encoder.layers.14.layer_norm2.weight', 'text_model.encoder.layers.14.mlp.fc1.bias', 'text_model.encoder.layers.14.mlp.fc1.weight', 'text_model.encoder.layers.14.mlp.fc2.bias', 'text_model.encoder.layers.14.mlp.fc2.weight', 'text_model.encoder.layers.14.self_attn.k_proj.bias', 'text_model.encoder.layers.14.self_attn.k_proj.weight', 'text_model.encoder.layers.14.self_attn.out_proj.bias', 'text_model.encoder.layers.14.self_attn.out_proj.weight', 'text_model.encoder.layers.14.self_attn.q_proj.bias', 'text_model.encoder.layers.14.self_attn.q_proj.weight', 'text_model.encoder.layers.14.self_attn.v_proj.bias', 'text_model.encoder.layers.14.self_attn.v_proj.weight', 'text_model.encoder.layers.15.layer_norm1.bias', 'text_model.encoder.layers.15.layer_norm1.weight', 'text_model.encoder.layers.15.layer_norm2.bias', 'text_model.encoder.layers.15.layer_norm2.weight', 'text_model.encoder.layers.15.mlp.fc1.bias', 'text_model.encoder.layers.15.mlp.fc1.weight', 'text_model.encoder.layers.15.mlp.fc2.bias', 'text_model.encoder.layers.15.mlp.fc2.weight', 'text_model.encoder.layers.15.self_attn.k_proj.bias', 'text_model.encoder.layers.15.self_attn.k_proj.weight', 'text_model.encoder.layers.15.self_attn.out_proj.bias', 'text_model.encoder.layers.15.self_attn.out_proj.weight', 'text_model.encoder.layers.15.self_attn.q_proj.bias', 'text_model.encoder.layers.15.self_attn.q_proj.weight', 'text_model.encoder.layers.15.self_attn.v_proj.bias', 'text_model.encoder.layers.15.self_attn.v_proj.weight', 'text_model.encoder.layers.16.layer_norm1.bias', 'text_model.encoder.layers.16.layer_norm1.weight', 'text_model.encoder.layers.16.layer_norm2.bias', 'text_model.encoder.layers.16.layer_norm2.weight', 'text_model.encoder.layers.16.mlp.fc1.bias', 'text_model.encoder.layers.16.mlp.fc1.weight', 'text_model.encoder.layers.16.mlp.fc2.bias', 'text_model.encoder.layers.16.mlp.fc2.weight', 'text_model.encoder.layers.16.self_attn.k_proj.bias', 'text_model.encoder.layers.16.self_attn.k_proj.weight', 'text_model.encoder.layers.16.self_attn.out_proj.bias', 'text_model.encoder.layers.16.self_attn.out_proj.weight', 'text_model.encoder.layers.16.self_attn.q_proj.bias', 'text_model.encoder.layers.16.self_attn.q_proj.weight', 'text_model.encoder.layers.16.self_attn.v_proj.bias', 'text_model.encoder.layers.16.self_attn.v_proj.weight', 'text_model.encoder.layers.17.layer_norm1.bias', 'text_model.encoder.layers.17.layer_norm1.weight', 'text_model.encoder.layers.17.layer_norm2.bias', 'text_model.encoder.layers.17.layer_norm2.weight', 'text_model.encoder.layers.17.mlp.fc1.bias', 'text_model.encoder.layers.17.mlp.fc1.weight', 'text_model.encoder.layers.17.mlp.fc2.bias', 'text_model.encoder.layers.17.mlp.fc2.weight', 'text_model.encoder.layers.17.self_attn.k_proj.bias', 'text_model.encoder.layers.17.self_attn.k_proj.weight', 'text_model.encoder.layers.17.self_attn.out_proj.bias', 'text_model.encoder.layers.17.self_attn.out_proj.weight', 'text_model.encoder.layers.17.self_attn.q_proj.bias', 'text_model.encoder.layers.17.self_attn.q_proj.weight', 'text_model.encoder.layers.17.self_attn.v_proj.bias', 'text_model.encoder.layers.17.self_attn.v_proj.weight', 'text_model.encoder.layers.18.layer_norm1.bias', 'text_model.encoder.layers.18.layer_norm1.weight', 'text_model.encoder.layers.18.layer_norm2.bias', 'text_model.encoder.layers.18.layer_norm2.weight', 'text_model.encoder.layers.18.mlp.fc1.bias', 'text_model.encoder.layers.18.mlp.fc1.weight', 'text_model.encoder.layers.18.mlp.fc2.bias', 'text_model.encoder.layers.18.mlp.fc2.weight', 'text_model.encoder.layers.18.self_attn.k_proj.bias', 'text_model.encoder.layers.18.self_attn.k_proj.weight', 'text_model.encoder.layers.18.self_attn.out_proj.bias', 'text_model.encoder.layers.18.self_attn.out_proj.weight', 'text_model.encoder.layers.18.self_attn.q_proj.bias', 'text_model.encoder.layers.18.self_attn.q_proj.weight', 'text_model.encoder.layers.18.self_attn.v_proj.bias', 'text_model.encoder.layers.18.self_attn.v_proj.weight', 'text_model.encoder.layers.19.layer_norm1.bias', 'text_model.encoder.layers.19.layer_norm1.weight', 'text_model.encoder.layers.19.layer_norm2.bias', 'text_model.encoder.layers.19.layer_norm2.weight', 'text_model.encoder.layers.19.mlp.fc1.bias', 'text_model.encoder.layers.19.mlp.fc1.weight', 'text_model.encoder.layers.19.mlp.fc2.bias', 'text_model.encoder.layers.19.mlp.fc2.weight', 'text_model.encoder.layers.19.self_attn.k_proj.bias', 'text_model.encoder.layers.19.self_attn.k_proj.weight', 'text_model.encoder.layers.19.self_attn.out_proj.bias', 'text_model.encoder.layers.19.self_attn.out_proj.weight', 'text_model.encoder.layers.19.self_attn.q_proj.bias', 'text_model.encoder.layers.19.self_attn.q_proj.weight', 'text_model.encoder.layers.19.self_attn.v_proj.bias', 'text_model.encoder.layers.19.self_attn.v_proj.weight', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.20.layer_norm1.bias', 'text_model.encoder.layers.20.layer_norm1.weight', 'text_model.encoder.layers.20.layer_norm2.bias', 'text_model.encoder.layers.20.layer_norm2.weight', 'text_model.encoder.layers.20.mlp.fc1.bias', 'text_model.encoder.layers.20.mlp.fc1.weight', 'text_model.encoder.layers.20.mlp.fc2.bias', 'text_model.encoder.layers.20.mlp.fc2.weight', 'text_model.encoder.layers.20.self_attn.k_proj.bias', 'text_model.encoder.layers.20.self_attn.k_proj.weight', 'text_model.encoder.layers.20.self_attn.out_proj.bias', 'text_model.encoder.layers.20.self_attn.out_proj.weight', 'text_model.encoder.layers.20.self_attn.q_proj.bias', 'text_model.encoder.layers.20.self_attn.q_proj.weight', 'text_model.encoder.layers.20.self_attn.v_proj.bias', 'text_model.encoder.layers.20.self_attn.v_proj.weight', 'text_model.encoder.layers.21.layer_norm1.bias', 'text_model.encoder.layers.21.layer_norm1.weight', 'text_model.encoder.layers.21.layer_norm2.bias', 'text_model.encoder.layers.21.layer_norm2.weight', 'text_model.encoder.layers.21.mlp.fc1.bias', 'text_model.encoder.layers.21.mlp.fc1.weight', 'text_model.encoder.layers.21.mlp.fc2.bias', 'text_model.encoder.layers.21.mlp.fc2.weight', 'text_model.encoder.layers.21.self_attn.k_proj.bias', 'text_model.encoder.layers.21.self_attn.k_proj.weight', 'text_model.encoder.layers.21.self_attn.out_proj.bias', 'text_model.encoder.layers.21.self_attn.out_proj.weight', 'text_model.encoder.layers.21.self_attn.q_proj.bias', 'text_model.encoder.layers.21.self_attn.q_proj.weight', 'text_model.encoder.layers.21.self_attn.v_proj.bias', 'text_model.encoder.layers.21.self_attn.v_proj.weight', 'text_model.encoder.layers.22.layer_norm1.bias', 'text_model.encoder.layers.22.layer_norm1.weight', 'text_model.encoder.layers.22.layer_norm2.bias', 'text_model.encoder.layers.22.layer_norm2.weight', 'text_model.encoder.layers.22.mlp.fc1.bias', 'text_model.encoder.layers.22.mlp.fc1.weight', 'text_model.encoder.layers.22.mlp.fc2.bias', 'text_model.encoder.layers.22.mlp.fc2.weight', 'text_model.encoder.layers.22.self_attn.k_proj.bias', 'text_model.encoder.layers.22.self_attn.k_proj.weight', 'text_model.encoder.layers.22.self_attn.out_proj.bias', 'text_model.encoder.layers.22.self_attn.out_proj.weight', 'text_model.encoder.layers.22.self_attn.q_proj.bias', 'text_model.encoder.layers.22.self_attn.q_proj.weight', 'text_model.encoder.layers.22.self_attn.v_proj.bias', 'text_model.encoder.layers.22.self_attn.v_proj.weight', 'text_model.encoder.layers.23.layer_norm1.bias', 'text_model.encoder.layers.23.layer_norm1.weight', 'text_model.encoder.layers.23.layer_norm2.bias', 'text_model.encoder.layers.23.layer_norm2.weight', 'text_model.encoder.layers.23.mlp.fc1.bias', 'text_model.encoder.layers.23.mlp.fc1.weight', 'text_model.encoder.layers.23.mlp.fc2.bias', 'text_model.encoder.layers.23.mlp.fc2.weight', 'text_model.encoder.layers.23.self_attn.k_proj.bias', 'text_model.encoder.layers.23.self_attn.k_proj.weight', 'text_model.encoder.layers.23.self_attn.out_proj.bias', 'text_model.encoder.layers.23.self_attn.out_proj.weight', 'text_model.encoder.layers.23.self_attn.q_proj.bias', 'text_model.encoder.layers.23.self_attn.q_proj.weight', 'text_model.encoder.layers.23.self_attn.v_proj.bias', 'text_model.encoder.layers.23.self_attn.v_proj.weight', 'text_model.encoder.layers.24.layer_norm1.bias', 'text_model.encoder.layers.24.layer_norm1.weight', 'text_model.encoder.layers.24.layer_norm2.bias', 'text_model.encoder.layers.24.layer_norm2.weight', 'text_model.encoder.layers.24.mlp.fc1.bias', 'text_model.encoder.layers.24.mlp.fc1.weight', 'text_model.encoder.layers.24.mlp.fc2.bias', 'text_model.encoder.layers.24.mlp.fc2.weight', 'text_model.encoder.layers.24.self_attn.k_proj.bias', 'text_model.encoder.layers.24.self_attn.k_proj.weight', 'text_model.encoder.layers.24.self_attn.out_proj.bias', 'text_model.encoder.layers.24.self_attn.out_proj.weight', 'text_model.encoder.layers.24.self_attn.q_proj.bias', 'text_model.encoder.layers.24.self_attn.q_proj.weight', 'text_model.encoder.layers.24.self_attn.v_proj.bias', 'text_model.encoder.layers.24.self_attn.v_proj.weight', 'text_model.encoder.layers.25.layer_norm1.bias', 'text_model.encoder.layers.25.layer_norm1.weight', 'text_model.encoder.layers.25.layer_norm2.bias', 'text_model.encoder.layers.25.layer_norm2.weight', 'text_model.encoder.layers.25.mlp.fc1.bias', 'text_model.encoder.layers.25.mlp.fc1.weight', 'text_model.encoder.layers.25.mlp.fc2.bias', 'text_model.encoder.layers.25.mlp.fc2.weight', 'text_model.encoder.layers.25.self_attn.k_proj.bias', 'text_model.encoder.layers.25.self_attn.k_proj.weight', 'text_model.encoder.layers.25.self_attn.out_proj.bias', 'text_model.encoder.layers.25.self_attn.out_proj.weight', 'text_model.encoder.layers.25.self_attn.q_proj.bias', 'text_model.encoder.layers.25.self_attn.q_proj.weight', 'text_model.encoder.layers.25.self_attn.v_proj.bias', 'text_model.encoder.layers.25.self_attn.v_proj.weight', 'text_model.encoder.layers.26.layer_norm1.bias', 'text_model.encoder.layers.26.layer_norm1.weight', 'text_model.encoder.layers.26.layer_norm2.bias', 'text_model.encoder.layers.26.layer_norm2.weight', 'text_model.encoder.layers.26.mlp.fc1.bias', 'text_model.encoder.layers.26.mlp.fc1.weight', 'text_model.encoder.layers.26.mlp.fc2.bias', 'text_model.encoder.layers.26.mlp.fc2.weight', 'text_model.encoder.layers.26.self_attn.k_proj.bias', 'text_model.encoder.layers.26.self_attn.k_proj.weight', 'text_model.encoder.layers.26.self_attn.out_proj.bias', 'text_model.encoder.layers.26.self_attn.out_proj.weight', 'text_model.encoder.layers.26.self_attn.q_proj.bias', 'text_model.encoder.layers.26.self_attn.q_proj.weight', 'text_model.encoder.layers.26.self_attn.v_proj.bias', 'text_model.encoder.layers.26.self_attn.v_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.final_layer_norm.bias', 'text_model.final_layer_norm.weight', 'text_model.head.bias', 'text_model.head.weight'] - This IS expected if you are initializing SiglipVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing SiglipVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 2025-02-15 02:47:58,196 - modeling_utils.py:4458 - _load_pretrained_model - INFO - All the weights of SiglipVisionModel were initialized from the model checkpoint at google/siglip-so400m-patch14-384. If your task is similar to the task the model of the checkpoint was trained on, you can already use SiglipVisionModel for predictions without further training. 2025-02-15 02:47:58,385 - image_processing_base.py:375 - get_image_processor_dict - INFO - loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--google--siglip-so400m-patch14-384/snapshots/9fdffc58afc957d1a03a25b10dba0329ab15c2a3/preprocessor_config.json 2025-02-15 02:47:58,386 - image_processing_base.py:429 - from_dict - INFO - Image processor SiglipImageProcessor { "do_convert_rgb": null, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.5, 0.5, 0.5 ], "image_processor_type": "SiglipImageProcessor", "image_std": [ 0.5, 0.5, 0.5 ], "processor_class": "SiglipProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "height": 384, "width": 384 } } 2025-02-15 02:47:58,770 - configuration_utils.py:733 - _get_config_dict - INFO - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/config.json 2025-02-15 02:47:58,774 - configuration_utils.py:800 - from_dict - INFO - Model config Dinov2Config { "apply_layernorm": true, "architectures": [ "Dinov2Model" ], "attention_probs_dropout_prob": 0.0, "drop_path_rate": 0.0, "hidden_act": "gelu", "hidden_dropout_prob": 0.0, "hidden_size": 1536, "image_size": 518, "initializer_range": 0.02, "layer_norm_eps": 1e-06, "layerscale_value": 1.0, "mlp_ratio": 4, "model_type": "dinov2", "num_attention_heads": 24, "num_channels": 3, "num_hidden_layers": 40, "out_features": [ "stage40" ], "out_indices": [ 40 ], "patch_size": 14, "qkv_bias": true, "reshape_hidden_states": true, "stage_names": [ "stem", "stage1", "stage2", "stage3", "stage4", "stage5", "stage6", "stage7", "stage8", "stage9", "stage10", "stage11", "stage12", "stage13", "stage14", "stage15", "stage16", "stage17", "stage18", "stage19", "stage20", "stage21", "stage22", "stage23", "stage24", "stage25", "stage26", "stage27", "stage28", "stage29", "stage30", "stage31", "stage32", "stage33", "stage34", "stage35", "stage36", "stage37", "stage38", "stage39", "stage40" ], "torch_dtype": "float32", "transformers_version": "4.43.1", "use_swiglu_ffn": true } 2025-02-15 02:47:58,774 - modeling_utils.py:3621 - from_pretrained - INFO - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/model.safetensors 2025-02-15 02:47:59,345 - modeling_utils.py:4450 - _load_pretrained_model - INFO - All model checkpoint weights were used when initializing Dinov2Model. 2025-02-15 02:47:59,345 - modeling_utils.py:4458 - _load_pretrained_model - INFO - All the weights of Dinov2Model were initialized from the model checkpoint at facebook/dinov2-giant. If your task is similar to the task the model of the checkpoint was trained on, you can already use Dinov2Model for predictions without further training. 2025-02-15 02:47:59,835 - image_processing_base.py:375 - get_image_processor_dict - INFO - loading configuration file preprocessor_config.json from cache at /root/.cache/huggingface/hub/models--facebook--dinov2-giant/snapshots/611a9d42f2335e0f921f1e313ad3c1b7178d206d/preprocessor_config.json 2025-02-15 02:47:59,839 - image_processing_base.py:429 - from_dict - INFO - Image processor BitImageProcessor { "crop_size": { "height": 378, "width": 378 }, "do_center_crop": true, "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.485, 0.456, 0.406 ], "image_processor_type": "BitImageProcessor", "image_std": [ 0.229, 0.224, 0.225 ], "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "shortest_edge": 378 } } 2025-02-15 02:48:00,652 - finetune_llama.py:1239 - train - INFO - Total params: 3264865280 2025-02-15 02:48:00,652 - finetune_llama.py:1240 - train - INFO - Trainable params: 12589056 2025-02-15 02:48:00,652 - finetune_llama.py:1241 - train - INFO - LM head params: 394002432 2025-02-15 02:48:03,400 - trainer_callback.py:423 - add_callback - WARNING - You are adding a to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is :DefaultFlowCallback TensorBoardCallback 2025-02-15 02:48:03,401 - trainer.py:648 - __init__ - INFO - Using auto half precision backend 2025-02-15 02:48:03,943 - trainer.py:2134 - _inner_training_loop - INFO - ***** Running training ***** 2025-02-15 02:48:03,944 - trainer.py:2135 - _inner_training_loop - INFO - Num examples = 540 2025-02-15 02:48:03,944 - trainer.py:2136 - _inner_training_loop - INFO - Num Epochs = 2 2025-02-15 02:48:03,944 - trainer.py:2137 - _inner_training_loop - INFO - Instantaneous batch size per device = 1 2025-02-15 02:48:03,944 - trainer.py:2140 - _inner_training_loop - INFO - Total train batch size (w. parallel, distributed & accumulation) = 1 2025-02-15 02:48:03,944 - trainer.py:2141 - _inner_training_loop - INFO - Gradient Accumulation steps = 1 2025-02-15 02:48:03,944 - trainer.py:2142 - _inner_training_loop - INFO - Total optimization steps = 1,080 2025-02-15 02:48:03,946 - trainer.py:2143 - _inner_training_loop - INFO - Number of trainable parameters = 406,591,488 2025-02-15 02:49:04,560 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:04,560 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0] 2025-02-15 02:49:04,586 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224 2025-02-15 02:49:04,590 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:04,590 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 1244, 3, 384, 384]), torch.float32, cuda:0] 2025-02-15 02:49:04,591 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:04,591 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 1244, 3, 378, 378]), torch.float32, cuda:0] 2025-02-15 02:49:23,651 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino 2025-02-15 02:49:23,651 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 871 2025-02-15 02:49:23,651 - resource_logging.py:150 - __exit__ - DEBUG - Time: 19.05 seconds 2025-02-15 02:49:23,651 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:23,651 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 19945.19 MB 2025-02-15 02:49:23,651 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 24381.13 MB 2025-02-15 02:49:23,651 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 4435.94 MB 2025-02-15 02:49:23,651 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 21149.78 MB 2025-02-15 02:49:23,651 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 25249.71 MB 2025-02-15 02:49:23,651 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 4099.93 MB 2025-02-15 02:49:23,651 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 33301.23 MB 2025-02-15 02:49:23,817 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame 2025-02-15 02:49:23,817 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 877 2025-02-15 02:49:23,817 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.16 seconds 2025-02-15 02:49:23,817 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:23,817 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 24381.13 MB 2025-02-15 02:49:23,817 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 20586.93 MB 2025-02-15 02:49:23,817 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: -3794.20 MB 2025-02-15 02:49:23,817 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 25249.71 MB 2025-02-15 02:49:23,817 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 36014.39 MB 2025-02-15 02:49:23,817 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 10764.68 MB 2025-02-15 02:49:23,817 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 37374.40 MB 2025-02-15 02:49:25,772 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip 2025-02-15 02:49:25,772 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 892 2025-02-15 02:49:25,772 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.95 seconds 2025-02-15 02:49:25,772 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:25,772 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20586.93 MB 2025-02-15 02:49:25,772 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 21117.77 MB 2025-02-15 02:49:25,772 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 530.84 MB 2025-02-15 02:49:25,772 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 36014.39 MB 2025-02-15 02:49:25,772 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22944.94 MB 2025-02-15 02:49:25,772 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -13069.45 MB 2025-02-15 02:49:25,772 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 25097.36 MB 2025-02-15 02:49:25,786 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1 2025-02-15 02:49:25,786 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 933 2025-02-15 02:49:25,786 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds 2025-02-15 02:49:25,786 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:25,786 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 21117.77 MB 2025-02-15 02:49:25,787 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 23007.14 MB 2025-02-15 02:49:25,787 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 1889.36 MB 2025-02-15 02:49:25,787 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22944.94 MB 2025-02-15 02:49:25,787 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 25778.19 MB 2025-02-15 02:49:25,787 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 2833.25 MB 2025-02-15 02:49:25,787 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 24424.57 MB 2025-02-15 02:49:26,007 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group 2025-02-15 02:49:26,007 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 951 2025-02-15 02:49:26,007 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.22 seconds 2025-02-15 02:49:26,007 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:26,007 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 23007.14 MB 2025-02-15 02:49:26,007 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 25248.99 MB 2025-02-15 02:49:26,007 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 2241.86 MB 2025-02-15 02:49:26,007 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 25778.19 MB 2025-02-15 02:49:26,007 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 32388.42 MB 2025-02-15 02:49:26,007 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 6610.22 MB 2025-02-15 02:49:26,007 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 30795.37 MB 2025-02-15 02:49:26,007 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA 2025-02-15 02:49:26,008 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 928 2025-02-15 02:49:26,008 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.23 seconds 2025-02-15 02:49:26,008 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:26,008 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 21117.77 MB 2025-02-15 02:49:26,008 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 25248.99 MB 2025-02-15 02:49:26,008 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 4131.22 MB 2025-02-15 02:49:26,008 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22944.94 MB 2025-02-15 02:49:26,008 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 32388.42 MB 2025-02-15 02:49:26,008 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 9443.48 MB 2025-02-15 02:49:26,008 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 30795.37 MB 2025-02-15 02:49:26,213 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding 2025-02-15 02:49:26,213 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1094 2025-02-15 02:49:26,213 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.20 seconds 2025-02-15 02:49:26,213 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:26,213 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 26782.54 MB 2025-02-15 02:49:26,213 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 27550.59 MB 2025-02-15 02:49:26,213 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 768.05 MB 2025-02-15 02:49:26,213 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 32388.42 MB 2025-02-15 02:49:26,213 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 32805.75 MB 2025-02-15 02:49:26,214 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 417.33 MB 2025-02-15 02:49:26,214 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 28258.38 MB 2025-02-15 02:49:26,240 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC 2025-02-15 02:49:26,240 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1395 2025-02-15 02:49:26,240 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds 2025-02-15 02:49:26,240 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:26,240 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 27963.48 MB 2025-02-15 02:49:26,240 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 28192.24 MB 2025-02-15 02:49:26,240 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 228.77 MB 2025-02-15 02:49:26,240 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 32805.75 MB 2025-02-15 02:49:26,240 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 32805.75 MB 2025-02-15 02:49:26,240 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:49:26,240 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 28428.20 MB 2025-02-15 02:49:26,242 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal 2025-02-15 02:49:26,242 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309 2025-02-15 02:49:26,242 - resource_logging.py:150 - __exit__ - DEBUG - Time: 21.65 seconds 2025-02-15 02:49:26,242 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:26,242 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 15610.21 MB 2025-02-15 02:49:26,242 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 28393.10 MB 2025-02-15 02:49:26,242 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 12782.88 MB 2025-02-15 02:49:26,242 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 16812.87 MB 2025-02-15 02:49:26,242 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 32805.75 MB 2025-02-15 02:49:26,242 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 15992.88 MB 2025-02-15 02:49:26,242 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 28428.20 MB 2025-02-15 02:49:26,271 - logging.py:328 - warning_once - WARNING - The attention layers in this model are transitioning from computing the RoPE embeddings internally through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed `position_embeddings` (Tuple of tensors, containing cos and sin). In v4.45 `position_ids` will be removed and `position_embeddings` will be mandatory. 2025-02-15 02:49:26,537 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward 2025-02-15 02:49:26,537 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390 2025-02-15 02:49:26,537 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.29 seconds 2025-02-15 02:49:26,537 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:26,537 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 17634.89 MB 2025-02-15 02:49:26,537 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 20645.61 MB 2025-02-15 02:49:26,537 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 3010.72 MB 2025-02-15 02:49:26,537 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 32805.75 MB 2025-02-15 02:49:26,537 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 32805.75 MB 2025-02-15 02:49:26,537 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:49:26,537 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 20946.64 MB 2025-02-15 02:49:26,555 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8153, cut from 8155 2025-02-15 02:49:26,558 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 2,'] 2025-02-15 02:49:26,567 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits 2025-02-15 02:49:26,567 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456 2025-02-15 02:49:26,567 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.03 seconds 2025-02-15 02:49:26,567 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:26,567 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20645.61 MB 2025-02-15 02:49:26,567 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 29076.01 MB 2025-02-15 02:49:26,567 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 8430.40 MB 2025-02-15 02:49:26,567 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 32805.75 MB 2025-02-15 02:49:26,567 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 41185.97 MB 2025-02-15 02:49:26,567 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 8380.22 MB 2025-02-15 02:49:26,567 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 29076.01 MB 2025-02-15 02:49:26,729 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7945] 2025-02-15 02:49:26,730 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:26,730 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0] 2025-02-15 02:49:26,731 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:26,731 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0] 2025-02-15 02:49:26,736 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237] 2025-02-15 02:49:26,737 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:26,737 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0] 2025-02-15 02:49:26,737 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 2,'] 2025-02-15 02:49:38,456 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:38,456 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0] 2025-02-15 02:49:38,462 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224 2025-02-15 02:49:38,467 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:38,467 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 136, 3, 384, 384]), torch.float32, cuda:0] 2025-02-15 02:49:38,468 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:38,468 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 136, 3, 378, 378]), torch.float32, cuda:0] 2025-02-15 02:49:40,583 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino 2025-02-15 02:49:40,583 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 871 2025-02-15 02:49:40,583 - resource_logging.py:150 - __exit__ - DEBUG - Time: 2.11 seconds 2025-02-15 02:49:40,583 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:40,583 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 13916.38 MB 2025-02-15 02:49:40,583 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 14398.72 MB 2025-02-15 02:49:40,583 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 482.34 MB 2025-02-15 02:49:40,583 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 49566.19 MB 2025-02-15 02:49:40,583 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 19314.77 MB 2025-02-15 02:49:40,583 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -30251.42 MB 2025-02-15 02:49:40,583 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 23387.75 MB 2025-02-15 02:49:40,593 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame 2025-02-15 02:49:40,593 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 877 2025-02-15 02:49:40,593 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds 2025-02-15 02:49:40,593 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:40,593 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 14398.72 MB 2025-02-15 02:49:40,593 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 14588.72 MB 2025-02-15 02:49:40,593 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 190.00 MB 2025-02-15 02:49:40,593 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 19314.77 MB 2025-02-15 02:49:40,593 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 19314.77 MB 2025-02-15 02:49:40,593 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:49:40,593 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 16224.77 MB 2025-02-15 02:49:41,224 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip 2025-02-15 02:49:41,224 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 892 2025-02-15 02:49:41,224 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.63 seconds 2025-02-15 02:49:41,224 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,224 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 14588.72 MB 2025-02-15 02:49:41,224 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 14761.18 MB 2025-02-15 02:49:41,224 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 172.46 MB 2025-02-15 02:49:41,224 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 19314.77 MB 2025-02-15 02:49:41,224 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 18842.91 MB 2025-02-15 02:49:41,224 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -471.86 MB 2025-02-15 02:49:41,224 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 18759.34 MB 2025-02-15 02:49:41,230 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1 2025-02-15 02:49:41,231 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 933 2025-02-15 02:49:41,231 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.00 seconds 2025-02-15 02:49:41,231 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,231 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 14761.18 MB 2025-02-15 02:49:41,231 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 15375.13 MB 2025-02-15 02:49:41,231 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 613.95 MB 2025-02-15 02:49:41,231 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 18842.91 MB 2025-02-15 02:49:41,231 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 18842.91 MB 2025-02-15 02:49:41,231 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:49:41,231 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 15835.80 MB 2025-02-15 02:49:41,302 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group 2025-02-15 02:49:41,302 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 951 2025-02-15 02:49:41,302 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.07 seconds 2025-02-15 02:49:41,302 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,302 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 15375.13 MB 2025-02-15 02:49:41,302 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 16103.78 MB 2025-02-15 02:49:41,302 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 728.65 MB 2025-02-15 02:49:41,302 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 18842.91 MB 2025-02-15 02:49:41,302 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 18842.91 MB 2025-02-15 02:49:41,302 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:49:41,302 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 17905.62 MB 2025-02-15 02:49:41,302 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA 2025-02-15 02:49:41,302 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 928 2025-02-15 02:49:41,302 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.08 seconds 2025-02-15 02:49:41,302 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,302 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 14761.18 MB 2025-02-15 02:49:41,302 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 16103.78 MB 2025-02-15 02:49:41,302 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 1342.60 MB 2025-02-15 02:49:41,302 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 18842.91 MB 2025-02-15 02:49:41,302 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 18842.91 MB 2025-02-15 02:49:41,302 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:49:41,303 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 17905.62 MB 2025-02-15 02:49:41,357 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding 2025-02-15 02:49:41,357 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1094 2025-02-15 02:49:41,357 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.05 seconds 2025-02-15 02:49:41,357 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,357 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 16602.18 MB 2025-02-15 02:49:41,357 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 16851.45 MB 2025-02-15 02:49:41,357 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 249.28 MB 2025-02-15 02:49:41,357 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 18842.91 MB 2025-02-15 02:49:41,357 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 18972.93 MB 2025-02-15 02:49:41,357 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 130.02 MB 2025-02-15 02:49:41,357 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 17093.03 MB 2025-02-15 02:49:41,366 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC 2025-02-15 02:49:41,366 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1395 2025-02-15 02:49:41,366 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds 2025-02-15 02:49:41,366 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,366 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 16985.65 MB 2025-02-15 02:49:41,366 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 17190.69 MB 2025-02-15 02:49:41,366 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 205.04 MB 2025-02-15 02:49:41,366 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 18972.93 MB 2025-02-15 02:49:41,366 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 18977.13 MB 2025-02-15 02:49:41,366 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 4.19 MB 2025-02-15 02:49:41,366 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 17204.15 MB 2025-02-15 02:49:41,367 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal 2025-02-15 02:49:41,367 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309 2025-02-15 02:49:41,367 - resource_logging.py:150 - __exit__ - DEBUG - Time: 2.90 seconds 2025-02-15 02:49:41,367 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,367 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 13442.54 MB 2025-02-15 02:49:41,367 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 17391.39 MB 2025-02-15 02:49:41,367 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 3948.85 MB 2025-02-15 02:49:41,367 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 49566.19 MB 2025-02-15 02:49:41,367 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 18977.13 MB 2025-02-15 02:49:41,367 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -30589.06 MB 2025-02-15 02:49:41,367 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 17391.39 MB 2025-02-15 02:49:41,636 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward 2025-02-15 02:49:41,636 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390 2025-02-15 02:49:41,636 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds 2025-02-15 02:49:41,636 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,636 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 17391.39 MB 2025-02-15 02:49:41,636 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 20399.90 MB 2025-02-15 02:49:41,636 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 3008.50 MB 2025-02-15 02:49:41,636 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 18977.13 MB 2025-02-15 02:49:41,636 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22064.14 MB 2025-02-15 02:49:41,637 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 3087.01 MB 2025-02-15 02:49:41,637 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 20701.22 MB 2025-02-15 02:49:41,654 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8147, cut from 8149 2025-02-15 02:49:41,655 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['The video rate for this video is 2 ('] 2025-02-15 02:49:41,661 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits 2025-02-15 02:49:41,661 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456 2025-02-15 02:49:41,661 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds 2025-02-15 02:49:41,661 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:49:41,661 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20399.90 MB 2025-02-15 02:49:41,661 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 28823.10 MB 2025-02-15 02:49:41,661 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 8423.21 MB 2025-02-15 02:49:41,661 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22064.14 MB 2025-02-15 02:49:41,661 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 32535.22 MB 2025-02-15 02:49:41,661 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 10471.08 MB 2025-02-15 02:49:41,661 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 28823.10 MB 2025-02-15 02:49:41,825 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7939] 2025-02-15 02:49:41,826 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:41,826 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0] 2025-02-15 02:49:41,828 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:41,828 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0] 2025-02-15 02:49:41,832 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237] 2025-02-15 02:49:41,834 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:49:41,834 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0] 2025-02-15 02:49:41,834 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['The video rate for this video is 2 ('] 2025-02-15 02:50:42,716 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:42,717 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0] 2025-02-15 02:50:42,722 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224 2025-02-15 02:50:42,725 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:42,725 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 246, 3, 384, 384]), torch.float32, cuda:0] 2025-02-15 02:50:42,726 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:42,726 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 246, 3, 378, 378]), torch.float32, cuda:0] 2025-02-15 02:50:46,460 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino 2025-02-15 02:50:46,460 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 871 2025-02-15 02:50:46,460 - resource_logging.py:150 - __exit__ - DEBUG - Time: 3.73 seconds 2025-02-15 02:50:46,460 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:46,460 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20015.33 MB 2025-02-15 02:50:46,460 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 20885.91 MB 2025-02-15 02:50:46,460 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 870.58 MB 2025-02-15 02:50:46,460 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 40911.24 MB 2025-02-15 02:50:46,460 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 24175.97 MB 2025-02-15 02:50:46,460 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -16735.27 MB 2025-02-15 02:50:46,460 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 29714.00 MB 2025-02-15 02:50:46,473 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame 2025-02-15 02:50:46,473 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 877 2025-02-15 02:50:46,473 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds 2025-02-15 02:50:46,473 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:46,473 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20885.91 MB 2025-02-15 02:50:46,473 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 20162.96 MB 2025-02-15 02:50:46,473 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: -722.96 MB 2025-02-15 02:50:46,473 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 24175.97 MB 2025-02-15 02:50:46,473 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 24175.97 MB 2025-02-15 02:50:46,473 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:50:46,473 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 22051.80 MB 2025-02-15 02:50:46,878 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip 2025-02-15 02:50:46,878 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 892 2025-02-15 02:50:46,878 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.40 seconds 2025-02-15 02:50:46,878 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:46,878 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20162.96 MB 2025-02-15 02:50:46,878 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 20273.11 MB 2025-02-15 02:50:46,878 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 110.15 MB 2025-02-15 02:50:46,878 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 24175.97 MB 2025-02-15 02:50:46,878 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22793.95 MB 2025-02-15 02:50:46,878 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -1382.02 MB 2025-02-15 02:50:46,878 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 24248.71 MB 2025-02-15 02:50:46,883 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1 2025-02-15 02:50:46,883 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 933 2025-02-15 02:50:46,883 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.00 seconds 2025-02-15 02:50:46,883 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:46,883 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20273.04 MB 2025-02-15 02:50:46,883 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 20665.02 MB 2025-02-15 02:50:46,883 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 391.98 MB 2025-02-15 02:50:46,883 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22793.95 MB 2025-02-15 02:50:46,883 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22793.95 MB 2025-02-15 02:50:46,883 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:50:46,883 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 20959.15 MB 2025-02-15 02:50:46,966 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group 2025-02-15 02:50:46,966 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 951 2025-02-15 02:50:46,966 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.08 seconds 2025-02-15 02:50:46,966 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:46,966 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20665.02 MB 2025-02-15 02:50:46,966 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 21141.43 MB 2025-02-15 02:50:46,966 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 476.41 MB 2025-02-15 02:50:46,966 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22793.95 MB 2025-02-15 02:50:46,966 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22793.95 MB 2025-02-15 02:50:46,966 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:50:46,966 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 22281.62 MB 2025-02-15 02:50:46,967 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA 2025-02-15 02:50:46,967 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 928 2025-02-15 02:50:46,967 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.09 seconds 2025-02-15 02:50:46,967 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:46,967 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 20273.04 MB 2025-02-15 02:50:46,967 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 21141.43 MB 2025-02-15 02:50:46,967 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 868.39 MB 2025-02-15 02:50:46,967 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22793.95 MB 2025-02-15 02:50:46,967 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22793.95 MB 2025-02-15 02:50:46,967 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:50:46,967 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 22281.62 MB 2025-02-15 02:50:47,010 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding 2025-02-15 02:50:47,010 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1094 2025-02-15 02:50:47,010 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.04 seconds 2025-02-15 02:50:47,010 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:47,010 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 21601.10 MB 2025-02-15 02:50:47,010 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 16468.59 MB 2025-02-15 02:50:47,010 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: -5132.51 MB 2025-02-15 02:50:47,010 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22793.95 MB 2025-02-15 02:50:47,010 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22917.68 MB 2025-02-15 02:50:47,010 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 123.73 MB 2025-02-15 02:50:47,010 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 21608.88 MB 2025-02-15 02:50:47,035 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC 2025-02-15 02:50:47,035 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1395 2025-02-15 02:50:47,035 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds 2025-02-15 02:50:47,035 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:47,035 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 16595.07 MB 2025-02-15 02:50:47,035 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 16794.67 MB 2025-02-15 02:50:47,035 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 199.60 MB 2025-02-15 02:50:47,035 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22917.68 MB 2025-02-15 02:50:47,035 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22917.68 MB 2025-02-15 02:50:47,035 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:50:47,035 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 16794.67 MB 2025-02-15 02:50:47,036 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal 2025-02-15 02:50:47,036 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309 2025-02-15 02:50:47,036 - resource_logging.py:150 - __exit__ - DEBUG - Time: 4.31 seconds 2025-02-15 02:50:47,036 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:47,036 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 19158.25 MB 2025-02-15 02:50:47,036 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 16971.96 MB 2025-02-15 02:50:47,036 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: -2186.29 MB 2025-02-15 02:50:47,036 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 40911.24 MB 2025-02-15 02:50:47,036 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22917.68 MB 2025-02-15 02:50:47,036 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -17993.56 MB 2025-02-15 02:50:47,036 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 16971.96 MB 2025-02-15 02:50:47,269 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward 2025-02-15 02:50:47,269 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390 2025-02-15 02:50:47,269 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.23 seconds 2025-02-15 02:50:47,269 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:47,269 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 14308.22 MB 2025-02-15 02:50:47,269 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 16965.77 MB 2025-02-15 02:50:47,269 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 2657.55 MB 2025-02-15 02:50:47,269 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22917.68 MB 2025-02-15 02:50:47,269 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 22917.68 MB 2025-02-15 02:50:47,269 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:50:47,269 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 17231.50 MB 2025-02-15 02:50:47,285 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 7195, cut from 7197 2025-02-15 02:50:47,285 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['The final rate for this video is 2 ('] 2025-02-15 02:50:47,291 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits 2025-02-15 02:50:47,291 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456 2025-02-15 02:50:47,291 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds 2025-02-15 02:50:47,291 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:50:47,291 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 16965.77 MB 2025-02-15 02:50:47,291 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 24407.32 MB 2025-02-15 02:50:47,291 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 7441.55 MB 2025-02-15 02:50:47,291 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 22917.68 MB 2025-02-15 02:50:47,291 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 26617.05 MB 2025-02-15 02:50:47,291 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 3699.38 MB 2025-02-15 02:50:47,291 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 24407.32 MB 2025-02-15 02:50:47,434 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 6987] 2025-02-15 02:50:47,435 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:47,435 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0] 2025-02-15 02:50:47,436 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:47,436 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0] 2025-02-15 02:50:47,441 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237] 2025-02-15 02:50:47,442 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:47,442 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0] 2025-02-15 02:50:47,442 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['The final rate for this video is 2 ('] 2025-02-15 02:50:54,671 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:54,671 - resource_logging.py:45 - debug_tensor - DEBUG - In compute_loss(): inputs['labels']: [torch.Size([1, 8192]), torch.int64, cuda:0] 2025-02-15 02:50:54,676 - mm_trainer.py:618 - compute_loss - DEBUG - In compute_loss(): assistant token at position 224 2025-02-15 02:50:54,680 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:54,680 - resource_logging.py:45 - debug_tensor - DEBUG - images_0: [torch.Size([1, 1370, 3, 384, 384]), torch.float32, cuda:0] 2025-02-15 02:50:54,681 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:50:54,681 - resource_logging.py:45 - debug_tensor - DEBUG - images_1: [torch.Size([1, 1370, 3, 378, 378]), torch.float32, cuda:0] 2025-02-15 02:51:15,575 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:dino 2025-02-15 02:51:15,576 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 871 2025-02-15 02:51:15,576 - resource_logging.py:150 - __exit__ - DEBUG - Time: 20.89 seconds 2025-02-15 02:51:15,576 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:15,576 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 22515.09 MB 2025-02-15 02:51:15,576 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 27363.70 MB 2025-02-15 02:51:15,576 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 4848.62 MB 2025-02-15 02:51:15,576 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 34015.81 MB 2025-02-15 02:51:15,576 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 36612.08 MB 2025-02-15 02:51:15,576 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 2596.27 MB 2025-02-15 02:51:15,576 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 36289.81 MB 2025-02-15 02:51:15,652 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> select_frame 2025-02-15 02:51:15,652 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 877 2025-02-15 02:51:15,652 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.07 seconds 2025-02-15 02:51:15,652 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:15,652 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 27363.70 MB 2025-02-15 02:51:15,652 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 22900.05 MB 2025-02-15 02:51:15,652 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: -4463.65 MB 2025-02-15 02:51:15,652 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 36612.08 MB 2025-02-15 02:51:15,652 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 45069.89 MB 2025-02-15 02:51:15,652 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 8457.81 MB 2025-02-15 02:51:15,652 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 40081.89 MB 2025-02-15 02:51:17,555 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> encode_images:siglip 2025-02-15 02:51:17,555 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 892 2025-02-15 02:51:17,555 - resource_logging.py:150 - __exit__ - DEBUG - Time: 1.90 seconds 2025-02-15 02:51:17,555 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:17,555 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 22900.05 MB 2025-02-15 02:51:17,555 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 23430.89 MB 2025-02-15 02:51:17,555 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 530.84 MB 2025-02-15 02:51:17,555 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 45069.89 MB 2025-02-15 02:51:17,555 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 31763.46 MB 2025-02-15 02:51:17,555 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: -13306.43 MB 2025-02-15 02:51:17,555 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 27409.44 MB 2025-02-15 02:51:17,568 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> mm_projector_aux_0/1 2025-02-15 02:51:17,568 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 933 2025-02-15 02:51:17,568 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.01 seconds 2025-02-15 02:51:17,568 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:17,569 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 23430.89 MB 2025-02-15 02:51:17,569 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 25320.43 MB 2025-02-15 02:51:17,569 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 1889.53 MB 2025-02-15 02:51:17,569 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 31763.46 MB 2025-02-15 02:51:17,569 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 31763.46 MB 2025-02-15 02:51:17,569 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:51:17,569 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 26737.86 MB 2025-02-15 02:51:17,778 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA -> query_group 2025-02-15 02:51:17,778 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 951 2025-02-15 02:51:17,778 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.21 seconds 2025-02-15 02:51:17,778 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:17,778 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 25320.43 MB 2025-02-15 02:51:17,778 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 27562.28 MB 2025-02-15 02:51:17,778 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 2241.86 MB 2025-02-15 02:51:17,778 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 31763.46 MB 2025-02-15 02:51:17,778 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 35538.34 MB 2025-02-15 02:51:17,778 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 3774.87 MB 2025-02-15 02:51:17,778 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 33106.57 MB 2025-02-15 02:51:17,779 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> SVA 2025-02-15 02:51:17,779 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 928 2025-02-15 02:51:17,779 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.22 seconds 2025-02-15 02:51:17,779 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:17,779 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 23430.89 MB 2025-02-15 02:51:17,779 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 27562.28 MB 2025-02-15 02:51:17,779 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 4131.39 MB 2025-02-15 02:51:17,779 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 31763.46 MB 2025-02-15 02:51:17,779 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 35538.34 MB 2025-02-15 02:51:17,779 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 3774.87 MB 2025-02-15 02:51:17,779 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 33106.57 MB 2025-02-15 02:51:17,954 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> rearrange_vision_tower+padding 2025-02-15 02:51:17,954 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1094 2025-02-15 02:51:17,954 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.17 seconds 2025-02-15 02:51:17,954 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:17,954 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 29095.83 MB 2025-02-15 02:51:17,954 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 29862.83 MB 2025-02-15 02:51:17,954 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 767.00 MB 2025-02-15 02:51:17,954 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 35538.34 MB 2025-02-15 02:51:17,954 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 35955.67 MB 2025-02-15 02:51:17,954 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 417.33 MB 2025-02-15 02:51:17,955 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 30570.62 MB 2025-02-15 02:51:17,973 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianMetaForCausalLM -> prepare_inputs_labels_for_multimodal -> Embedding+Cross-modal+STC 2025-02-15 02:51:17,973 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/cambrian_arch.py, Line: 1395 2025-02-15 02:51:17,973 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds 2025-02-15 02:51:17,973 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:17,973 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 30275.72 MB 2025-02-15 02:51:17,973 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 30503.46 MB 2025-02-15 02:51:17,973 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 227.74 MB 2025-02-15 02:51:17,973 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 35955.67 MB 2025-02-15 02:51:17,973 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 35955.67 MB 2025-02-15 02:51:17,973 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:51:17,973 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 30718.98 MB 2025-02-15 02:51:17,974 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> prepare_inputs_labels_for_multimodal 2025-02-15 02:51:17,974 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 309 2025-02-15 02:51:17,974 - resource_logging.py:150 - __exit__ - DEBUG - Time: 23.29 seconds 2025-02-15 02:51:17,974 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:17,974 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 17741.90 MB 2025-02-15 02:51:17,974 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 30704.31 MB 2025-02-15 02:51:17,974 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 12962.42 MB 2025-02-15 02:51:17,974 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 34015.81 MB 2025-02-15 02:51:17,974 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 35955.67 MB 2025-02-15 02:51:17,974 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 1939.87 MB 2025-02-15 02:51:17,974 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 30718.98 MB 2025-02-15 02:51:18,242 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> model.forward 2025-02-15 02:51:18,242 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 390 2025-02-15 02:51:18,242 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.27 seconds 2025-02-15 02:51:18,242 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:18,242 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 30704.31 MB 2025-02-15 02:51:18,242 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 22736.09 MB 2025-02-15 02:51:18,242 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: -7968.23 MB 2025-02-15 02:51:18,242 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 35955.67 MB 2025-02-15 02:51:18,242 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 35955.67 MB 2025-02-15 02:51:18,242 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 0.00 MB 2025-02-15 02:51:18,242 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 33207.38 MB 2025-02-15 02:51:18,260 - cambrian_llama.py:481 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): Found assistant token at index 8134, cut from 8136 2025-02-15 02:51:18,260 - cambrian_llama.py:487 - forward - INFO - In CambrianLlamaForCausalLM.forward(): Decoded assistant outputs: ['2 final rate for this video is 1 ('] 2025-02-15 02:51:18,266 - resource_logging.py:148 - __exit__ - DEBUG - Section name: CambrianLlamaForCausalLM -> forward -> lm_head, logits 2025-02-15 02:51:18,267 - resource_logging.py:149 - __exit__ - DEBUG - File: /root/hcmus/LongVidLLaMA/longvu/language_model/cambrian_llama.py, Line: 456 2025-02-15 02:51:18,267 - resource_logging.py:150 - __exit__ - DEBUG - Time: 0.02 seconds 2025-02-15 02:51:18,267 - resource_logging.py:151 - __exit__ - DEBUG - Device: cuda:0 2025-02-15 02:51:18,267 - resource_logging.py:152 - __exit__ - DEBUG - Allocated before block: 22736.09 MB 2025-02-15 02:51:18,267 - resource_logging.py:153 - __exit__ - DEBUG - Allocated after block: 31145.89 MB 2025-02-15 02:51:18,267 - resource_logging.py:154 - __exit__ - DEBUG - Net allocated change: 8409.81 MB 2025-02-15 02:51:18,267 - resource_logging.py:155 - __exit__ - DEBUG - Reserved before block: 35955.67 MB 2025-02-15 02:51:18,267 - resource_logging.py:156 - __exit__ - DEBUG - Reserved after block: 40135.29 MB 2025-02-15 02:51:18,267 - resource_logging.py:157 - __exit__ - DEBUG - Net reserved change: 4179.62 MB 2025-02-15 02:51:18,267 - resource_logging.py:158 - __exit__ - DEBUG - Peak allocated: 31145.89 MB 2025-02-15 02:51:18,429 - cambrian_llama.py:512 - forward - DEBUG - sample 0: correct range [16, 7926] 2025-02-15 02:51:18,431 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:51:18,431 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_logits: [torch.Size([1, 237, 128256]), torch.float32, cuda:0] 2025-02-15 02:51:18,432 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:51:18,432 - resource_logging.py:45 - debug_tensor - DEBUG - In CambrianLlamaForCausalLM.forward(): orig_labels: [torch.Size([1, 238]), torch.int64, cuda:0] 2025-02-15 02:51:18,437 - cambrian_llama.py:529 - forward - DEBUG - In CambrianLlamaForCausalLM.forward(): sample 0: output range: [225, 237] 2025-02-15 02:51:18,438 - resource_logging.py:42 - debug_tensor - DEBUG - File: Unknown, Line: Unknown 2025-02-15 02:51:18,438 - resource_logging.py:45 - debug_tensor - DEBUG - outs: [torch.Size([1, 12]), torch.int64, cuda:0] 2025-02-15 02:51:18,438 - cambrian_llama.py:533 - forward - INFO - sample 0: decoded outputs: ['2 final rate for this video is 1 (']