MARLIN
Collection
MARLIN: Masked Autoencoder for facial video Representation LearnINg
•
5 items
•
Updated
This repo is the official PyTorch implementation for the paper MARLIN: Masked Autoencoder for facial video Representation LearnINg (CVPR 2023) (arXiv).
transformers
(HuggingFace) for Feature Extraction
Requirements:
Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc).
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained(
"ControlNet/marlin_vit_small_ytf", # or other variants
trust_remote_code=True
)
tensor = torch.rand([1, 3, 16, 224, 224]) # (B, C, T, H, W)
output = model(tensor) # torch.Size([1, 1568, 384])
This project is under the CC BY-NC 4.0 license. See LICENSE for details.
If you find this work useful for your research, please consider citing it.
@inproceedings{cai2022marlin,
title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg},
author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2023},
month = {June},
pages = {1493-1504},
doi = {10.1109/CVPR52729.2023.00150},
publisher = {IEEE},
}