A survey on video classification using action recognition

  • Authors

    • Caleb Andrew
    • Rex Fiona
    2018-05-29
    https://doi.org/10.14419/ijet.v7i2.31.13404
  • Video classification, machine learning, multiple instance learning (MIL), conditional random field (CRFs), action recognition, gesture recognition
  • The growth in multimedia technology have resulted in producing a variety of videos every day. These videos should be classified in order to help people identify the correct video which they search for when needed. The video classification problem can be said as a probabilistic data classification problem which falls as a subcategory of the machine learning technique. Classification helps in indexing, analyzing, searching etc. A survey has been made on the present technologies that are used for video classification. Various techniques used for video classification such as Multiple Instance Learning (MIL), Conditional Random Field (CRFs) and classifying based on the action and gesture are studied.

     

  • References

    1. [1] Xiao X, Hu H & Wang W, “Trajectories-based motion neighbourhood feature for human action recognitionâ€, IEEE International Conference on Image Processing (ICIP), (2017), pp.4147-4151.

      [2] Zeggada A, Benbraika S, Melgani F & Mokhtari Z, “Multilabel Conditional Random Field Classification for UAV Imagesâ€, IEEE Geoscience and Remote Sensing Letters, (2018), pp.399-403.

      [3] Astorino A, Fuduli A, Veltri P & Vocaturo E, “On a recent algorithm for multiple instance learning. Preliminary applications in image classificationsâ€, IEEE International conference on Bioinformatics and Biomedicine (BIBM), (2017), pp.1615-1619.

      [4] Dietterich TG, Lathrop RH & Lozano-PeÌrez T, “Solving the multiple instance problem with axis-parallel rectanglesâ€, Artificial. Intelligence, (1997), pp.31–71.

      [5] Asadi-Aghbolaghi M, Clapes A, Bellantonio M, Escalante HJ, Ponce-López V, Baró X, Guyon I, Kasaei S & Escalera S, “A survey on deep learning based approaches for action and gesture recognition in image sequencesâ€, 2th IEEE International Conference on Automatic Face & Gesture Recognition, (2017), pp. 476-483.

      [6] Zhou Y & Ming A, “Semi-Supervised Multiple Instance Learning and its application in visual trackingâ€, 8th International Conference on Wireless Communications & Signal Processing, (2016).

      [7] Lafferty JD, McCallum A & Pereira FCN, “Conditional random fields: probabilistic models for segmenting and labelling sequence dataâ€, Proceedings of the Eighteenth International Conference on Machine Learning (ICML), (2001), pp.282–289.

      [8] Deselaers T & Ferrari V, “A conditional random field for multiple-instance learningâ€, Proceedings of the Twenty-Seventh International Conference on Machine Learning (ICML), (2010), pp.287–294

      [9] Laptev I, Marszalek M, Schmid C & Rozenfeld B, “Learning realistic human actions from moviesâ€, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, (2008), pp.1–8.

      [10] Peng Y, Zhao Y & Zhang J, “Two-streams Collaborative Learning with Spatial-temporal Attention for Video Classificationâ€, IEEE Transactions on Circuits and Systems for Video Technology, (2018).

      [11] Wang H, Kläser A, Schmid C & Liu CL, “Dense trajectories and motion boundary descriptors for action recognitionâ€, Int. J. Comput. Vis., Vol.103, No.1, (2013), pp.60–79.

      [12] Liu J & Chen C, “Video classification via weekly supervised sequence modelingâ€, computer vision and Image understanding, Vol.152, (2016), pp.79-87.

      [13] Wang H & Schmid C, “Action recognition with improved trajectoriesâ€, Proceedings of the IEEE International Conference on Computer Vision ICCV, (2013), pp.3551–3558.

      [14] Oikonomopoulos A, Patras I & Pantic M, “Spatiotemporal localization and categorization of human actions in unsegmented image sequencesâ€, Trans. Image Process., Vol.20, No.4, (2011), pp.1126–1140.

      [15] Sun C & Nevatia R, “Large-scale web video event classification by use of Fisher vectorsâ€, Proceedings of the IEEE Workshop on Applications of Computer Vision WACV, (2013), pp. 15–22.

      [16] JeÌgou H, Douze M, Schmid C & PeÌrez P, “Aggregating local descriptors into a compact image representationâ€, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, (2010), pp. 3304– 3311.

      [17] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R & Fei-Fei L, “Large-scale video classification with convolutional neural networksâ€, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, (2014).

      [18] Wu Z, Jiang YG, Wang J, Pu J & Xue X, “Exploring inter-feature and inter-class relationships with deep neural networks for video classificationâ€, Proceedings of the ACM International Conference on Multimedia MM, (2014), pp.167–176.

      [19] Shapovalova N, Vahdat A, Cannons K, Lan T & Mori G, “Similarity constrained latent support vector machine: an application to weakly supervised action classificationâ€, Proceedings of the Twelfth European Conference on Computer Vision ECCV, Springer, (2012).

  • Downloads

  • How to Cite

    Andrew, C., & Fiona, R. (2018). A survey on video classification using action recognition. International Journal of Engineering & Technology, 7(2.31), 89-93. https://doi.org/10.14419/ijet.v7i2.31.13404