Visual recognition and classification of videos using deep convolutional neural networks

N Shobha Rani; Pramod N. Rao; Paul Clinton

doi:10.14419/ijet.v7i2.31.13403

Authors

N Shobha Rani
Pramod N. Rao
Paul Clinton

Received date: May 28, 2018

Accepted date: May 28, 2018

Published date: May 29, 2018

DOI:

https://doi.org/10.14419/ijet.v7i2.31.13403

Keywords:

Sports videos, convolutional neural networks, local binary patterns, bag of words features, SURF, K-Means clustering, video processing

Abstract

Classification of videos based on its content is one of the challenging and significant research problems. In this paper, a simple and efficient model is proposed for classification of sports videos using deep learned convolution neural networks. In the proposed research, the gray scale variants of image frames are employed for classification process through convolution technique at varied levels of abstraction by adapting it through a sequence of hidden layers. The image frames considered for classification are obtained after the duplicate frame elimination and each frame is further rescaled to dimension 120x240. The sports videos categories used for experimentation include badminton, football, cricket and tennis which are downloaded from various sources of google and YouTube. The classification in the proposed method is performed with Deep Convolution Neural Networks (DCNN) with around 20 filters each of size 5x5 with around stride length of2 and its outcomes are compared with Local Binary Patterns (LBP), Bag of Words Features (BWF) technique. The SURF features are extracted from the BWF technique and further 80% of strongest feature points are employed for clustering the image frames using K-Means clustering technique with an average accuracy achieved of about 87% in classification. The LBF technique had produced an average accuracy of 73% in differentiating one image frame to other whereas the DCNN had shown a promising outcome with accuracy of about 91% in case of 40% training and 60% test datasets, 99% accuracy in case of 60% training an 40% test datasets. The results depict that the proposed method outperforms the image processing-based techniques LBP and BWF.
Â

References

[1] Krizhevsky A, Sutskever I & Hinton GE, â€œImagenet classification with deep convolutional neural networksâ€, Advances in neural information processing systems, (2012), pp.1097-1105
[2] Ciregan D, Meier U & Schmidhuber J, â€œMulti-column deep neural networks for image classificationâ€, IEEE conference on Computer vision and pattern recognition (CVPR), (2012), pp.3642-3649.
[3] Simonyan K & Zisserman A, â€œVery deep convolutional networks for large-scale image recognitionâ€, arXiv preprint arXiv:1409.1556, (2014).
[4] Zeiler MD & Fergus R, â€œVisualizing and understanding convolutional networksâ€, European conference on computer vision, (2014), pp. 818-833.
[5] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R & LeCun Y, â€œOverfeat: Integrated recognition, localization and detection using convolutional networksâ€, arXiv preprint arXiv:1312.6229, (2013).
[6] Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R. & Fei-Fei L, â€œLarge-scale video classification with convolutional neural networksâ€, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, (2014), pp.1725-1732.
[7] Ng JYH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R & Toderici G, â€œBeyond short snippets: Deep networks for video classificationâ€, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), pp.4694-4702.
[8] Brezeale D & Cook DJ, â€œAutomatic video classification: A survey of the literatureâ€, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol.38, No.3, (2008), pp.416-430.
[9] Zhou W, Vellaikal A & Kuo CC, â€œRule-based video classification system for basketball video indexingâ€, Proceedings of the ACM workshops on Multimedia, (2000), pp.213-216.
[10] Huang J, Liu Z, Wang Y, Chen Y & Wong EK, â€œIntegration of multimodal features for video scene classification based on HMMâ€, IEEE 3rd Workshop on Multimedia Signal Processing, (1999), pp. 53-58.
[11] Lin WH & Hauptmann A, â€œNews video classification using SVM-based multimodal classifiers and combination strategiesâ€, Proceedings of the tenth ACM international conference on Multimedia, (2002), pp.323-326.
[12] Xu LQ & Li Y, â€œVideo classification using spatial-temporal features and PCAâ€, International Conference on Multimedia and Expo, (2003).
[13] Dimitrova N, Agnihotri L & Wei G, â€œVideo classification based on HMM using text and facesâ€, 10th European Signal Processing Conference, (2000), pp.1-4.
[14] Yang J, Jiang YG, Hauptmann AG & Ngo CW, â€œEvaluating bag-of-visual-words representations in scene classificationâ€, Proceedings of the international workshop on Workshop on multimedia information retrieval, (2007), pp.197-206.
[15] Zhao G, Ahonen T, Matas J & Pietikainen M, â€œRotation-invariant image and video description with local binary pattern featuresâ€, IEEE Transactions on Image Processing, Vol.21, No.4,(2012), pp.1465-1477.
[16] Lippmann RP, â€œPattern classification using neural networksâ€, IEEE communications magazine, Vol.27, No.11,(1989), pp.47-50.
[17] Hinton GE, Osindero S & The YW, â€œA fast learning algorithm for deep belief netsâ€, Neural computation, Vol.18, No.7,(2006), pp.1527-1554.
[18] Rani NS & Ashwini PS, â€œA Standardized Frame work for Handwritten and Printed Kannada Numeral Recognition and Translation using Probabilistic Neural Networksâ€, IJISET - International Journal of Innovative Science, Engineering & Technology, Vol.1, No.4, (2014).
[19] Pushpa BR, Anand C & Mithun NP, â€œAyurvedic Plant Species Recognition using StatisticalParameters on Leaf Imagesâ€, International Journal of Applied Engineering Research, Vol.11, No.7,(2016), pp.5142-5147.

Visual recognition and classification of videos using deep convolutional neural networks

Authors

N Shobha Rani

Pramod N. Rao

Paul Clinton

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite