A survey on outlier detection methods in data mining

  • Authors

    • Roy Thomas Noorul Islam Centre for Higher Education
    • J. E. Judith Noorul Islam Centre for Higher Education
    2019-06-12
    https://doi.org/10.14419/ijet.v7i4.23153
  • Classification, Clustering, Data mining, Outliers, Proximity.
  • Outliers are data objects whose characteristics differ from the mainstream characteristics of the data objects in a data set. Outlier detection plays a vital role in statistics as well as in data mining. Outlier detection effects to find out hidden and important information from large data sets. It has been a research field with diverse application areas for the past few decades. Outlier detection has been a topic of research in many fields like detecting malicious activity in cyber security, finding fake transactions in banking, detecting abnormality in medical data, identifying defects in industrial products etc. and various methods have been developed for detecting outliers. Most of the methods are developed specifically for certain applications while others are generic methods. Outlier detection methods are grouped into supervised, unsupervised and semi-supervised methods depending on the availability of class labels. Outlier detection methods can also be classified into statistical, proximity-based, clustering-based and classification-based depending on the type of data. We, in this paper, present the relative advantages and limitations of various methods used for detecting outliers.

     

     

  • References

    1. [1] D. Hawkins, Identification of Outliers.Chapman and Hall, London and New York, 1980. https://doi.org/10.1007/978-94-015-3994-4.

      [2] J. Han, M. Kamber, and J Pei. Data Mining: Concepts and Techniques, Massachusetts (US): Morgan Kaufmann, 2012.

      [3] V. J. Hodge and J. Austin, A survey of outlier detection methodologies, Artificial Intelligence Review, vol. 22 (2), pp. 85-126, 2004. https://doi.org/10.1023/B:AIRE.0000045502.10941.a9.

      [4] A. Patcha, and J-M. Park, An overview of anomaly detection techniques: Existing solutions and latest technological trends, Computer Networks, 2007. https://doi.org/10.1016/j.comnet.2007.02.001.

      [5] V. Chandola, A. Banerjee, and V. Kumar, Anomaly detection: A survey,ACM Comput. Surveys, vol. 41, no. 3, pp. 1–58, 2009. https://doi.org/10.1145/1541880.1541882.

      [6] P. Gogoi, D.K.Bhattacharyya, B. Borah, J.K.Kalita, A Survey of Outlier Detection Methods in Network Anomaly Identification,OUP, The Computer Journal,2011. https://doi.org/10.1093/comjnl/bxr026.

      [7] A. Zimek, E. Schubert, and H-P Kriegel, A Survey on Unsupervised Outlier Detection in High-Dimensional Numerical Data, Wiley Periodicals, Inc.,2012. https://doi.org/10.1002/sam.11161.

      [8] W. Lu, Y. Cheng, C.Xiao, S.Chang, S.Huang, B.Liang, and T.Huang, Unsupervised Sequential Outlier Detection With Deep Architectures, IEEE Transactions on Image Processing, 2017, Volume: 26, Issue: 9. https://doi.org/10.1109/TIP.2017.2713048.

      [9] S. Wu, and S. Wang Information-Theoretic Outlier Detection for Large-Scale Categorical Data, IEEE Transactions on Knowledge and Data Engineering, Volume: 25, No:3, March 2013. https://doi.org/10.1109/TKDE.2011.261.

      [10] S. Papadopoulos, A.Drosou, and D.Tzovaras A Novel Graph-Based Descriptor for the Detection of Billing-Related Anomalies in Cellular Mobile Networks, IEEE Transactions on Mobile Computing, 2015. https://doi.org/10.1109/TMC.2016.2518668.

      [11] B. C. Neagu, G. GrigoraÅŸ, F. Scarlatache, Outliers Discovery from Smart Meters Data Using a Statistical Based Data Mining Approach, IEEE 10th International Symposium on Advanced Topics in Electrical Engineering (ATEE).2017. https://doi.org/10.1109/ATEE.2017.7905046.

      [12] W. Wang, Y.Liang, H. V. Poor, Nonparametric composite outlier detection , 2016 50th Asilomar Conference on Signals, Systems and Computers, Year: 2016. https://doi.org/10.1109/ACSSC.2016.7869574.

      [13] H. Ferdowsi,S.Jagannathan, and M. Zawodniok, An Online Outlier Identification and Removal Scheme for Improving Fault Detection Performance, IEEE Transactions on Neural Networks and Learning Systems, Volume:25, No:5, May 2014. https://doi.org/10.1109/TNNLS.2013.2283456.

      [14] C. Wang, J.Lai , D.Huang , and W.Zheng, SVStream: A Support Vector-Based Algorithm for Clustering Data Streams, IEEE Transactions on Knowledge and Data Engineering, Year: 2013 , Volume: 25 , Issue: 6. https://doi.org/10.1109/TKDE.2011.263.

      [15] B. Liu, Y. Xiao , P.S. Yu , Z. Hao,and L. Cao, An Efficient Approach for Outlier Detection with Imperfect Data Labels, IEEE Transactions on Knowledge and Data Engineering, Year: 2014 , Volume: 26 , Issue: 7. https://doi.org/10.1109/TKDE.2013.108.

      [16] S. Rajasegarar, C. Leckie , J.C. Bezdek , M. Palaniswami, Centered Hyperspherical and Hyperellipsoidal One-Class Support Vector Machines for Anomaly Detection in Sensor Networks, IEEE Transactions on Information Forensics and Security, Year: 2010 , Volume: 5 , Issue: 3. https://doi.org/10.1109/TIFS.2010.2051543.

      [17] B. Ekizoglu , A.Demiriz, Fuzzy rule-based analysis of spatio-temporal ATM usage data for fraud detection and prevention , 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Year: 2015. https://doi.org/10.1109/FSKD.2015.7382081.

      [18] L. Kao, Y. Huang, Association rules based algorithm for identifying outlier transactions in data stream, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Year: 2012. https://doi.org/10.1109/ICSMC.2012.6378285.

      [19] H. Jia, Y. Cheung, and J.Liu , A New Distance Metric for Unsupervised Learning of Categorical Data, IEEE Transactions on Neural Networks and Learning Systems, Volume:27, No:5, May 2016. https://doi.org/10.1109/TNNLS.2015.2436432.

      [20] F. Angiulli, S.Basta, and C.Pizzuti, Distance-Based Detection and Prediction of Outliers,IEEE Transactions on Knowledge and data engineering, year:2006,Vol:18, Issue: 2. https://doi.org/10.1109/TKDE.2006.29.

      [21] Q. Zhang,M. Qiao, R. R. Routray, and W. Shi ,H2O: A Hybrid and Hierarchical Outlier Detection Method for Large Scale Data Protection, IEEE International Conference on Big Data (Big Data), 2016. https://doi.org/10.1109/BigData.2016.7840715.

      [22] T-S. Xu, H-D. Chiang,G-Y. Liu, and C-W Tan, Hierarchical K-means Method for Clustering Large - Scale Advanced Metering Infrastructure Data, IEEE Transactions on Power Delivery,Volume: 32 , Issue: 2 , April 2017. https://doi.org/10.1109/TPWRD.2015.2479941.

      [23] H. C. Mandhare, and S. R. Idate, A Comparative Study of Cluster Based Outlier Detection, Distance Based Outlier Detection and Density Based Outlier Detection Techniques, IEEE International Conference on Intelligent Computing and Control Systems, 2017. https://doi.org/10.1109/ICCONS.2017.8250601.

      [24] Y. Gu, R. K. Ganesan, B. Bischke, A. Bernardi, A. Maier, H. Warkentin, T. Steckel, and A. Dengel,. Grid-based outlier detection in large data sets for combine harvesters. IEEE 15th International Conference on Industrial Informatics (INDIN), 2017. https://doi.org/10.1109/INDIN.2017.8104877.

      [25] Y. Xiang, L. Guohua, X. Xiandong, and L. Liandong, A data stream outlier detection algorithm based on grid. IEEE The 27th Chinese Control and Decision Conference, 2015. https://doi.org/10.1109/CCDC.2015.7162657.

  • Downloads

  • How to Cite

    Thomas, R., & E. Judith, J. (2019). A survey on outlier detection methods in data mining. International Journal of Engineering & Technology, 7(4), 6309-6312. https://doi.org/10.14419/ijet.v7i4.23153