Knowledge Distillation Based on MMRazor

Knowledge Distillation is a classic model compression method. The core idea is to "imitate" a teacher model (or multi-model ensemble) with better performance and more complex structure by guiding a lightweight student model, improving the performance of the student model without changing its structure. MMRazor is a model compression toolkit for model slimming and AutoML, which supports several KD algorithms. In this project, we take TSM-MobileNetV2 as an example to show how to use MMRazor to perform knowledge distillation on action recognition models. You could refer to more MMRazor for more model compression algorithms.

Description

This is an implementation of MMRazor Knowledge Distillation Application, we provide action recognition configs and models for MMRazor.

Usage

Prerequisites

MMRazor v1.0.0 or higher

There are two install modes:

Option (a). Install as a Python package

mim install "mmrazor>=1.0.0"

Option (b). Install from source

git clone https://github.com/open-mmlab/mmrazor.git
cd mmrazor
pip install -v -e .

Setup Environment

Please refer to Get Started to install MMAction2.

At first, add the current folder to PYTHONPATH, so that Python can find your code. Run command in the current directory to add it.

Please run it every time after you opened a new shell.

export PYTHONPATH=`pwd`:$PYTHONPATH

Data Preparation

Prepare the Kinetics400 dataset according to the instruction.

Create a symbolic link from $MMACTION2/data to ./data in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.

ln -s ../../data ./data

Training commands

To train with single GPU:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py

To train with multiple GPUs:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher pytorch --gpus 8

To train with multiple GPUs by slurm:

mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Testing commands

Please convert the knowledge distillation checkpoint to student-only checkpoint with following commands, you will get a checkpoint with a '_student.pth' suffix under the same directory as the original checkpoint. Then take the student-only checkpoint for testing.

mim run mmrazor convert_kd_ckpt_to_student $CHECKPOINT

To test with single GPU:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT

To test with multiple GPUs:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8

To test with multiple GPUs by slurm:

mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
    --gpus 8 --gpus-per-node 8 --partition $PARTITION

Results and models

Location	Dataset	Teacher	Student	Acc	Acc(T)	Acc(S)	Config	Download
logits	Kinetics-400	TSM-ResNet50	TSM-MobileNetV2	69.60(+0.9)	73.22	68.71	config	teacher \| model \| log
logits	Kinetics-400	TSN-Swin	TSN-ResNet50	75.54(+1.4)	79.22	74.12	config	teacher \| model \| log

Citation

@article{huang2022knowledge,
  title={Knowledge Distillation from A Stronger Teacher},
  author={Huang, Tao and You, Shan and Wang, Fei and Qian, Chen and Xu, Chang},
  journal={arXiv preprint arXiv:2205.10536},
  year={2022}
}