---
license: mit
datasets:
- uoft-cs/cifar10
- jxie/stl10
metrics:
- accuracy
pipeline_tag: image-feature-extraction
---
# Masked Autoencoders (MAE) with PCA-Based Variance Filtering

## Overview
This repository contains models trained using Masked Autoencoders (MAE) on CIFAR-10 and STL-10 datasets. The models are trained under different settings, focusing on the reconstruction of images based on principal component analysis (PCA) components with varying explained variances. The goal is to explore how concentrating on low-variance image components can improve representation learning. To better understand about low-variance components and high-variance components of a dataset, you can check the idea which is obtained from the paper "Learning by Reconstruction Produces Uninformative Features For Perception" by Randall Balestriero and Yann LeCun. The paper is available at [arXiv](https://arxiv.org/abs/2402.11337).

### Models Included:
- **No Mode:** MAE trained on original images.
- **Bottom 25% Variance:** MAE trained to reconstruct images using components with the lowest 25% variance.
- **Bottom 10% Variance:** MAE trained to reconstruct images using components with the lowest 10% variance.
- **Top 75% Variance:** MAE trained to reconstruct images using components with the highest 75% variance.
- **Top 60% Variance:** MAE trained to reconstruct images using components with the highest 60% variance.

## Model Details
### Dataset
- **CIFAR-10** and **STL-10**: Both datasets were used for training the models. They consist of images across various classes, and PCA was performed on these datasets to extract components of different variances.

### Training Procedure
- **PCA Application:** PCA was applied to the dataset images to separate components by explained variance.
- **MAE Training:** 
  - **No Mode:** Standard MAE training on the original images.
  - **Bottom 25% Variance:** MAE model trained to reconstruct images using only the bottom 25% variance components.
  - **Top 75% Variance:** MAE model trained to reconstruct images using the top 75% variance components.

### Evaluation
- **Fine-Tuning:** The pre-trained models were fine-tuned for classification tasks.
- **Linear Probing:** The models' representations were evaluated using linear probing to assess their quality.

## Results
The models demonstrate varying performance based on the variance components they focused on during training:
- **Bottom 25% Variance:** Expected to yield better representations, especially for detailed and nuanced image features.
- **Top 75% Variance:** Expected to perform worse due to the focus on broader, less informative features.

## Usage
You can download and use these models trained with Cifar10 dataset directly from the respective folders:
- **No Mode**: [Link to vit-t-mae-pretrain.pt](https://huggingface.co/turhancan97/MAE-Models/tree/main/cifar10/no_mode)
- **Bottom 25% Variance**: [Link to vit-t-mae-pretrain.pt](https://huggingface.co/turhancan97/MAE-Models/tree/main/cifar10/bottom_25)
- **Top 75% Variance**: [Link to vit-t-mae-pretrain.pt](https://huggingface.co/turhancan97/MAE-Models/tree/main/cifar10/top_75)

## How to Use
To use the pre-trained models, you can load them as follows:

```python
import torch

model = torch.load('path_to_model.pt')
model.eval()

# Example usage
output = model(input_data)
```

## Additional Resources
- **GitHub Repository**: For training scripts and further details, visit the [GitHub Repository](https://github.com/turhancan97/Learning-by-Reconstruction-with-MAE).