Qwen/Qwen2-VL-7B-Instruct · How are the weights of PatchEmbed initialized?

Here the initialization code Qwen2VisionTransformerPretrainedModel:

class Qwen2VisionTransformerPretrainedModel(Qwen2VLPreTrainedModel):
    config_class = Qwen2VLVisionConfig
    _no_split_modules = ["Qwen2VLVisionBlock"]

    def __init__(self, config) -> None:
        super().__init__(config)
        self.spatial_merge_size = config.spatial_merge_size

        self.patch_embed = PatchEmbed(
            patch_size=config.patch_size,
            temporal_patch_size=config.temporal_patch_size,
            in_channels=config.in_channels,
            embed_dim=config.embed_dim,
        )

Here is the code of PatchEmbed:

class PatchEmbed(nn.Module):
    def __init__(
        self,
        patch_size: int = 14,
        temporal_patch_size: int = 2,
        in_channels: int = 3,
        embed_dim: int = 1152,
    ) -> None:
        super().__init__()
        self.patch_size = patch_size
        self.temporal_patch_size = temporal_patch_size
        self.in_channels = in_channels
        self.embed_dim = embed_dim

        kernel_size = [temporal_patch_size, patch_size, patch_size]
        self.proj = nn.Conv3d(in_channels, embed_dim, kernel_size=kernel_size, stride=kernel_size, bias=False)

The weights of nn.Conv3d are initialized randomly. I don't see any other place in the code where the weights are set. Here are my questions:
Q1. PatchEmbed is used to embed image patches. Given that the conv3d's kernel weights are random, what's the value of it? I expect then that the embedded patches would be different from call to call. Is it how it should be?
Q2. In fact, when I load the model

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-7B-Instruct", torch_dtype="auto", device_map="auto"
)

the weights are always the same from call to call!! Thus, the weights are either deterministically generated OR initialized by pre-defined values. Where? How?

Thank you for you help, experts!