Visual recognition and classification of videos using deep convolutional neural networks

  • Authors

    • N Shobha Rani
    • Pramod N. Rao
    • Paul Clinton
    2018-05-29
    https://doi.org/10.14419/ijet.v7i2.31.13403
  • Sports videos, convolutional neural networks, local binary patterns, bag of words features, SURF, K-Means clustering, video processing
  • Classification of videos based on its content is one of the challenging and significant research problems. In this paper, a simple and efficient model is proposed for classification of sports videos using deep learned convolution neural networks. In the proposed research, the gray scale variants of image frames are employed for classification process through convolution technique at varied levels of abstraction by adapting it through a sequence of hidden layers. The image frames considered for classification are obtained after the duplicate frame elimination and each frame is further rescaled to dimension 120x240. The sports videos categories used for experimentation include badminton, football, cricket and tennis which are downloaded from various sources of google and YouTube. The classification in the proposed method is performed with Deep Convolution Neural Networks (DCNN) with around 20 filters each of size 5x5 with around stride length of2 and its outcomes are compared with Local Binary Patterns (LBP), Bag of Words Features (BWF) technique. The SURF features are extracted from the BWF technique and further 80% of strongest feature points are employed for clustering the image frames using K-Means clustering technique with an average accuracy achieved of about 87% in classification. The LBF technique had produced an average accuracy of 73% in differentiating one image frame to other whereas the DCNN had shown a promising outcome with accuracy of about 91% in case of 40% training and 60% test datasets, 99% accuracy in case of 60% training an 40% test datasets. The results depict that the proposed method outperforms the image processing-based techniques LBP and BWF.

     

  • References

    1. [1] Krizhevsky A, Sutskever I & Hinton GE, “Imagenet classification with deep convolutional neural networksâ€, Advances in neural information processing systems, (2012), pp.1097-1105

      [2] Ciregan D, Meier U & Schmidhuber J, “Multi-column deep neural networks for image classificationâ€, IEEE conference on Computer vision and pattern recognition (CVPR), (2012), pp.3642-3649.

      [3] Simonyan K & Zisserman A, “Very deep convolutional networks for large-scale image recognitionâ€, arXiv preprint arXiv:1409.1556, (2014).

      [4] Zeiler MD & Fergus R, “Visualizing and understanding convolutional networksâ€, European conference on computer vision, (2014), pp. 818-833.

      [5] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R & LeCun Y, “Overfeat: Integrated recognition, localization and detection using convolutional networksâ€, arXiv preprint arXiv:1312.6229, (2013).

      [6] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R. & Fei-Fei L, “Large-scale video classification with convolutional neural networksâ€, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, (2014), pp.1725-1732.

      [7] Ng JYH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R & Toderici G, “Beyond short snippets: Deep networks for video classificationâ€, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), pp.4694-4702.

      [8] Brezeale D & Cook DJ, “Automatic video classification: A survey of the literatureâ€, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol.38, No.3, (2008), pp.416-430.

      [9] Zhou W, Vellaikal A & Kuo CC, “Rule-based video classification system for basketball video indexingâ€, Proceedings of the ACM workshops on Multimedia, (2000), pp.213-216.

      [10] Huang J, Liu Z, Wang Y, Chen Y & Wong EK, “Integration of multimodal features for video scene classification based on HMMâ€, IEEE 3rd Workshop on Multimedia Signal Processing, (1999), pp. 53-58.

      [11] Lin WH & Hauptmann A, “News video classification using SVM-based multimodal classifiers and combination strategiesâ€, Proceedings of the tenth ACM international conference on Multimedia, (2002), pp.323-326.

      [12] Xu LQ & Li Y, “Video classification using spatial-temporal features and PCAâ€, International Conference on Multimedia and Expo, (2003).

      [13] Dimitrova N, Agnihotri L & Wei G, “Video classification based on HMM using text and facesâ€, 10th European Signal Processing Conference, (2000), pp.1-4.

      [14] Yang J, Jiang YG, Hauptmann AG & Ngo CW, “Evaluating bag-of-visual-words representations in scene classificationâ€, Proceedings of the international workshop on Workshop on multimedia information retrieval, (2007), pp.197-206.

      [15] Zhao G, Ahonen T, Matas J & Pietikainen M, “Rotation-invariant image and video description with local binary pattern featuresâ€, IEEE Transactions on Image Processing, Vol.21, No.4,(2012), pp.1465-1477.

      [16] Lippmann RP, “Pattern classification using neural networksâ€, IEEE communications magazine, Vol.27, No.11,(1989), pp.47-50.

      [17] Hinton GE, Osindero S & The YW, “A fast learning algorithm for deep belief netsâ€, Neural computation, Vol.18, No.7,(2006), pp.1527-1554.

      [18] Rani NS & Ashwini PS, “A Standardized Frame work for Handwritten and Printed Kannada Numeral Recognition and Translation using Probabilistic Neural Networksâ€, IJISET - International Journal of Innovative Science, Engineering & Technology, Vol.1, No.4, (2014).

      [19] Pushpa BR, Anand C & Mithun NP, “Ayurvedic Plant Species Recognition using StatisticalParameters on Leaf Imagesâ€, International Journal of Applied Engineering Research, Vol.11, No.7,(2016), pp.5142-5147.

  • Downloads

  • How to Cite

    Shobha Rani, N., N. Rao, P., & Clinton, P. (2018). Visual recognition and classification of videos using deep convolutional neural networks. International Journal of Engineering & Technology, 7(2.31), 85-88. https://doi.org/10.14419/ijet.v7i2.31.13403