How are the weights of PatchEmbed initialized?
#79
by
idruker
- opened
Here the initialization code Qwen2VisionTransformerPretrainedModel:
class Qwen2VisionTransformerPretrainedModel(Qwen2VLPreTrainedModel):
config_class = Qwen2VLVisionConfig
_no_split_modules = ["Qwen2VLVisionBlock"]
def __init__(self, config) -> None:
super().__init__(config)
self.spatial_merge_size = config.spatial_merge_size
self.patch_embed = PatchEmbed(
patch_size=config.patch_size,
temporal_patch_size=config.temporal_patch_size,
in_channels=config.in_channels,
embed_dim=config.embed_dim,
)
Here is the code of PatchEmbed:
class PatchEmbed(nn.Module):
def __init__(
self,
patch_size: int = 14,
temporal_patch_size: int = 2,
in_channels: int = 3,
embed_dim: int = 1152,
) -> None:
super().__init__()
self.patch_size = patch_size
self.temporal_patch_size = temporal_patch_size
self.in_channels = in_channels
self.embed_dim = embed_dim
kernel_size = [temporal_patch_size, patch_size, patch_size]
self.proj = nn.Conv3d(in_channels, embed_dim, kernel_size=kernel_size, stride=kernel_size, bias=False)
The weights of nn.Conv3d are initialized randomly. I don't see any other place in the code where the weights are set. Here are my questions:
Q1. PatchEmbed is used to embed image patches. Given that the conv3d's kernel weights are random, what's the value of it? I expect then that the embedded patches would be different from call to call. Is it how it should be?
Q2. In fact, when I load the model
model = Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
)
the weights are always the same from call to call!! Thus, the weights are either deterministically generated OR initialized by pre-defined values. Where? How?
Thank you for you help, experts!