MSG3D Project
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
Abstract
Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

Usage
Setup Environment
Please refer to Installation to install MMAction2.
Assume that you are located at $MMACTION2/projects/msg3d
.
Add the current folder to PYTHONPATH
, so that Python can find your code. Run the following command in the current directory to add it.
Please run it every time after you opened a new shell.
export PYTHONPATH=`pwd`:$PYTHONPATH
Data Preparation
Prepare the NTU60 dataset according to the instruction.
Create a symbolic link from $MMACTION2/data
to ./data
in the current directory, so that Python can locate your data. Run the following command in the current directory to create the symbolic link.
ln -s ../../data ./data
Data Preparation
Prepare the NTU60 dataset according to the instruction.
Training commands
To train with single GPU:
mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py
To train with multiple GPUs:
mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher pytorch --gpus 8
To train with multiple GPUs by slurm:
mim train mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
Testing commands
To test with single GPU:
mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT
To test with multiple GPUs:
mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher pytorch --gpus 8
To test with multiple GPUs by slurm:
mim test mmaction configs/msg3d_8xb16-joint-u100-80e_ntu60-xsub-keypoint-2d.py --checkpoint $CHECKPOINT --launcher slurm \
--gpus 8 --gpus-per-node 8 --partition $PARTITION
Results
NTU60_XSub_2D
frame sampling strategy | modality | gpus | backbone | top1 acc | testing protocol | config | ckpt | log |
---|---|---|---|---|---|---|---|---|
uniform 100 | joint | 8 | MSG3D | 92.3 | 10 clips | config | ckpt | log |
NTU60_XSub_3D
frame sampling strategy | modality | gpus | backbone | top1 acc | testing protocol | config | ckpt | log |
---|---|---|---|---|---|---|---|---|
uniform 100 | joint | 8 | MSG3D | 89.6 | 10 clips | config | ckpt | log |
Citation
@inproceedings{liu2020disentangling,
title={Disentangling and unifying graph convolutions for skeleton-based action recognition},
author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli},
booktitle={CVPR},
pages={143--152},
year={2020}
}