MARLIN: Masked Autoencoder for facial video Representation LearnINg

This repo is the official PyTorch implementation for the paper MARLIN: Masked Autoencoder for facial video Representation LearnINg (CVPR 2023) (arXiv).

Use transformers (HuggingFace) for Feature Extraction

Requirements:

  • Python
  • PyTorch
  • transformers
  • einops

Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc).

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained(
    "ControlNet/marlin_vit_small_ytf",  # or other variants
    trust_remote_code=True
)
tensor = torch.rand([1, 3, 16, 224, 224])  # (B, C, T, H, W)
output = model(tensor)  # torch.Size([1, 1568, 384])

License

This project is under the CC BY-NC 4.0 license. See LICENSE for details.

References

If you find this work useful for your research, please consider citing it.

@inproceedings{cai2022marlin,
  title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg},
  author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023},
  month = {June},
  pages = {1493-1504},
  doi = {10.1109/CVPR52729.2023.00150},
  publisher = {IEEE},
}
Downloads last month
2
Safetensors
Model size
22.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ControlNet/marlin_vit_small_ytf

Finetuned
ControlNet/MARLIN
Finetuned
(3)
this model

Collection including ControlNet/marlin_vit_small_ytf