Knowledge Distillation Based on MMRazor
Knowledge Distillation is a classic model compression method. The core idea is to "imitate" a teacher model (or multi-model ensemble) with better performance and more complex structure by guiding a lightweight student model, improving the performance of the student model without changing its structure. MMRazor is a model compression toolkit for model slimming and AutoML, which supports several KD algorithms. In this project, we take TSM-MobileNetV2 as an example to show how to use MMRazor to perform knowledge distillation on action recognition models. You could refer to more MMRazor for more model compression algorithms.
Description
This is an implementation of MMRazor Knowledge Distillation Application, we provide action recognition configs and models for MMRazor.
Usage
Prerequisites
- MMRazor v1.0.0 or higher
There are two install modes:
Option (a). Install as a Python package
mim install "mmrazor>=1.0.0"
Option (b). Install from source
git clone https://github.com/open-mmlab/mmrazor.git
cd mmrazor
pip install -v -e .
Setup Environment
Please refer to Get Started to install MMAction2.
At first, add the current folder to PYTHONPATH
, so that Python can find your code. Run command in the current directory to add it.
Please run it every time after you opened a new shell.
export PYTHONPATH=`pwd`:$PYTHONPATH
Data Preparation
Data Preparation
Prepare the Kinetics400 dataset according to the instruction.
Create a symbolic link from $MMACTION2/data
to ./data
in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.
ln -s ../../data ./data
Training commands
To train with single GPU:
mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py
To train with multiple GPUs:
mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher pytorch --gpus 8
To train with multiple GPUs by slurm:
mim train mmrazor configs/kd_logits_tsm-res50_tsm-mobilenetv2_8xb16_k400.py --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
Testing commands
Please convert the knowledge distillation checkpoint to student-only checkpoint with following commands, you will get a checkpoint with a '_student.pth' suffix under the same directory as the original checkpoint. Then take the student-only checkpoint for testing.
mim run mmrazor convert_kd_ckpt_to_student $CHECKPOINT
To test with single GPU:
mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT
To test with multiple GPUs:
mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8
To test with multiple GPUs by slurm:
mim test mmaction tsm_imagenet-pretrained-mobilenetv2_8xb16-1x1x8-100e_kinetics400-rgb.py --checkpoint $CHECKPOINT --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
Results and models
Location | Dataset | Teacher | Student | Acc | Acc(T) | Acc(S) | Config | Download |
---|---|---|---|---|---|---|---|---|
logits | Kinetics-400 | TSM-ResNet50 | TSM-MobileNetV2 | 69.60(+0.9) | 73.22 | 68.71 | config | teacher | model | log |
logits | Kinetics-400 | TSN-Swin | TSN-ResNet50 | 75.54(+1.4) | 79.22 | 74.12 | config | teacher | model | log |
Citation
@article{huang2022knowledge,
title={Knowledge Distillation from A Stronger Teacher},
author={Huang, Tao and You, Shan and Wang, Fei and Qian, Chen and Xu, Chang},
journal={arXiv preprint arXiv:2205.10536},
year={2022}
}