Spatial Joint features for 3D human skeletal action recognition system using spatial graph kernels


  • P V.V. Kishore
  • P Siva Kameswari
  • K Niharika
  • M Tanuja
  • M Bindu
  • D Anil Kumar
  • E Kiran Kumar
  • M Teja Kiran





Human Action Recognition, Skeleton Maps, Spatial Graph Kernels, Graph Matching.


Human action recognition is a vibrant area of research with multiple application areas in human machine interface. In this work, we propose a human action recognition based on spatial graph kernels on 3D skeletal data. Spatial joint features are extracted using joint distances between human joint distributions in 3D space. A spatial graph is constructed using 3D points as vertices and the computed joint distances as edges for each action frame in the video sequence. Spatial graph kernels between the training set and testing set are constructed to extract similarity between the two action sets. Two spatial graph kernels are constructed with vertex and edge data represented by joint positions and joint distances. To test the proposed method, we use 4 publicly available 3D skeletal datasets from G3D, MSR Action 3D, UT Kinect and NTU RGB+D. The proposed spatial graph kernels result in better classification accuracies compared to the state of the art models.


[1] Fujiyoshi, H., Lipton, A. J., & Kanade, T. (2004). Real-time human motion analysis by image skeletonization. IEICE TRANSACTIONS on Information and Systems, 87(1), 113-120.

[2] Song, S., Lan, C., Xing, J., Zeng, W., & Liu, J. (2017, February). An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. In AAAI (pp. 4263-4270).

[3] Bloom, V., Argyriou, V., & Makris, D. (2016). Hierarchical transfer learning for online recognition of compound actions. Computer Vision and Image Understanding, 144, 62-72.

[4] Li, W., Zhang, Z., & Liu, Z. (2010, June). Action recognition based on a bag of 3d points. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 9-14). IEEE.

[5] Xia, L., Chen, C. C., & Aggarwal, J. K. (2012, June). View invariant human action recognition using histograms of 3d joints. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on (pp. 20-27). IEEE.

[6] Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). NTU RGB+ D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1010-1019).

[7] Megavannan, V., Agarwal, B., & Babu, R. V. (2012, July). Human action recognition using depth maps. In Signal Processing and Communications (SPCOM), 2012 International Conference on (pp. 1-5). IEEE.

[8] Papadopoulos, G. T., Axenopoulos, A., & Daras, P. (2014, January). Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data. In MMM (1) (pp. 473-483).

[9] Patsadu, O., Nukoolkit, C., & Watanapa, B. (2012, May). Human gesture recognition using Kinect camera. In Computer Science and Software Engineering (JCSSE), 2012 International Joint Conference on (pp. 28-32). IEEE.

[10] Frati, V., & Prattichizzo, D. (2011, June). Using Kinect for hand tracking and rendering in wearable haptics. In World Haptics Conference (WHC), 2011 IEEE (pp. 317-321). IEEE.

[11] Oikonomidis, I., Kyriazis, N., & Argyros, A. A. (2011, August). Efficient model-based 3D tracking of hand articulations using Kinect. In BmVC (Vol. 1, No. 2, p. 3).

[12] Youssef, C. (2016). Spatiotemporal representation of 3d skeleton joints-based action recognition using modified spherical harmonics. Pattern Recognition Letters, 83, 32-41.

[13] Li, M., Leung, H., Liu, Z., & Zhou, L. (2016). 3D human motion retrieval using graph kernels based on adaptive graph construction. Computers & Graphics, 54, 104-112.

[14] Kishore, P. V. V., Kumar, D. A., Sastry, A. S. C. S., & Kumar, E. K. (2018). Motionlets Matching with Adaptive Kernels for 3D Indian Sign Language Recognition. IEEE Sensors Journal, 1–1. doi:10.1109/jsen.2018.2810449

[15] Kishore, P. V. V., Kumar, K. V. V., Kumar, E. K., Sastry, A. S. C. S., Kiran, M. T., Kumar, D. A., & Prasad, M. V. D. Indian Classical Dance Action Identification and Classification with Convolutional Neural Networks.

[16] Leightley, D., Li, B., McPhee, J. S., Yap, M. H., & Darby, J. (2014, October). Exemplar-based human action recognition with template matching from a stream of motion capture. In International Conference Image Analysis and Recognition (pp. 12-20). Springer, Cham.

[17] Xiao, Q., Wang, Y., & Wang, H. (2015). Motion retrieval using weighted graph matching. Soft Computing, 19(1), 133-144.

[18] Barnachon, M., Bouakaz, S., Boufama, B., & Guillou, E. (2014). Ongoing human action recognition with motion capture. Pattern Recognition, 47(1), 238-247.

[19] Cippitelli, E., Gasparrini, S., Gambi, E., & Spinsante, S. (2016). A human activity recognition system using skeleton data from rgbd sensors. Computational intelligence and neuroscience, 2016, 21.

[20] Kilner, J., Guillemaut, J. Y., & Hilton, A. (2009, September). 3D action matching with key-pose detection. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on (pp. 1-8). IEEE.

[21] Ta, A. P., Wolf, C., Lavoue, G., & Baskurt, A. (2010, August). Recognizing and localizing individual activities through graph matching. In Advanced Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE International Conference on (pp. 196-203). IEEE.

[22] Xiao, Q., & Siqi, L. (2017). Motion retrieval based on dynamic bayesian network and canonical time warping. Soft Computing, 21(1), 267-280.

[23] Celiktutan, O., Wolf, C., Sankur, B., & Lombardi, E. (2015). Fast exact hyper-graph matching with dynamic programming for spatio-temporal data. Journal of Mathematical Imaging and Vision, 51(1), 1-21.

[24] Fotiadou, E., & Nikolaidis, N. (2014). Activity-based methods for person recognition in motion capture sequences. Pattern Recognition Letters, 49, 48-54.

View Full Article: