Scalable density based spatial clustering with integrated one-class SVM for noise reduction

  • Authors

    • K. Nafees Ahmed BHARATHIDASAN UNIVERSITY
    • T. Abdul Razak BHARATHIDASAN UNIVERSITY
    https://doi.org/10.14419/ijet.v7i2.9.10093

    Received date: March 12, 2018

    Accepted date: March 25, 2018

    Published date: April 29, 2018

  • DBSCAN, One-Class SVM, Noise Reduction, Clustering, Spark.
  • Abstract

    Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class SVM, a machine learning technique for noise reduction, a modified variant of DBSCAN called NRDBSCAN. Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. Further, the proposed technique is parallelized using Spark architecture, thereby increasing its scalability and its ability to handle large amounts of data. Experiments and comparisons with similar techniques indicate high scalability levels and high homogeneity levels in the clustering process.

  • References

    1. Hartigan JA, and Wong MA,”Algorithm AS 136: A k-means clus-tering algorithm”, Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol.28, No.1, (1979), pp.100-108. https://doi.org/10.2307/2346830.
    2. Wei CP, Lee YH, and Hsu CM,”Empirical comparison of fast clus-tering algorithms for large data sets”, Proceedings of the 33rd An-nual Hawaii International Conference, (2000), pp:1-10.
    3. Ester M, Kriegel HP, Sander J, and Xu X,”A density-based algo-rithm for discovering clusters in large spatial databases with noise”, In KDD 1996, Vol.96, No.34, (1996), pp.226-231.
    4. Hinneburg A, and Keim DA,”An efficient approach to clustering in large multimedia databases with noise”, In KDD 1998, Vol.98, (1998), pp.58-65.
    5. Ankerst M, Breunig MM, Kriegel HP, and Sander J,”OPTICS: or-dering points to identify the clustering structure”, In ACM Sigmod record 1999, Vol.28, No.2, (1999), pp.49-60. https://doi.org/10.1145/304182.304187.
    6. Güngör E, and Özmen A,”Distance and density based clustering algorithm using Gaussian kernel”, Expert Systems with Applications, Vol.69, (2017), pp.10-20. https://doi.org/10.1016/j.eswa.2016.10.022.
    7. Zhou S, Zhou A, Jin W, Fan Y, and Qian W,”FDBSCAN: a fast DBSCAN algorithm”, Ruan Jian Xue Bao, Vol.11, No.6, (2000), pp.735-744.
    8. Tsai CF, and Yeh HF,”Npust: An efficient clustering algorithm us-ing partition space technique for large databases”, Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, (2009), pp: 787-796. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-02568-6_80.
    9. Chowdhury AR, Mollah ME, and Rahman MA,”An efficient meth-od for subjectively choosing parameter ‘k’automatically in VDB-SCAN (Varied Density Based Spatial Clustering of Applications with Noise) algorithm”, Proceedings of the 2nd International Con-ference on Computer and Automation Engineering (ICCAE), (2010), Vol.1, pp: 38-41. IEEE.
    10. Parimala M, Lopez D, and Senthilkumar NC,”A survey on density based clustering algorithms for mining large spatial databases”, In-ternational Journal of Advanced Science and Technology, Vol.31, No.1, (2011), pp.59-66.
    11. Chen X, Liu W, Qiu H, and Lai J,”APSCAN: A parameter free al-gorithm for clustering”, Pattern Recognition Letters, Vol.32, No.7, (2011), pp.973-986. https://doi.org/10.1016/j.patrec.2011.02.001.
    12. Zhu Y, Ting KM, and Carman MJ,”Density-ratio based clustering for discovering clusters with varying densities”, Pattern Recognition Letters, Vol.60, (2016), pp.983-997. https://doi.org/10.1016/j.patcog.2016.07.007.
    13. Louhichi S, Gzara M, and Ben-Abdallah H,”Unsupervised varied density based clustering algorithm using spline”, Pattern Recogni-tion Letters, 2016.
    14. Mai ST, He X, Feng J, Plant C, and Böhm C,”Anytime density-based clustering of complex data”, Knowledge and Information Sys-tems, Vol.45, No.2, (2015), pp.319-355. https://doi.org/10.1007/s10115-014-0797-0.
    15. Liu P, Zhou D, and Wu N,”VDBSCAN: varied density based spa-tial clustering of applications with noise”, Proceedings of the Inter-national Conference on Service Systems and Service Management, (2007), pp: 1-4. IEEE. https://doi.org/10.1109/ICSSSM.2007.4280175.
    16. Xiaoyun C, Yufang M, Yan Z, and Ping W,”GMDBSCAN: multi-density DBSCAN cluster based on grid”, Proceedings of the Inter-national Conference on e-Business Engineering (ICEBE), (2008), pp: 780-783. IEEE. https://doi.org/10.1109/ICEBE.2008.54.
    17. Borah B, and Bhattacharyya DK,” DDSC: a density differentiated spatial clustering technique”, Journal of Computers, Vol.3, No.2, (2008), pp.72-79. https://doi.org/10.4304/jcp.3.2.72-79.
    18. Ram A, Sharma A, Jalal AS, Agrawal A, and Singh R,”An en-hanced density based spatial clustering of applications with noise”, Proceedings of the International Conference on Advanced Compu-ting (IACC), (2009), pp:1475-1478. IEEE.
    19. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, and Williamson RC,”Estimating the support of a high-dimensional distribution”, Neural Computation, Vol.13, No.7, (2001), pp.1443-1471. https://doi.org/10.1162/089976601750264965.
    20. Manevitz LM, and Yousef M,”One-class SVMs for document clas-sification”, Journal of Machine Learning Research, Vol.2, (2001), pp.139-154.
    21. Nafees Ahmed K, and Abdul Razak T,”Density based clustering using modified PSO based neighbor selection”, International Jour-nal on Computer Science and Engineering (IJCSE), Vol.9, No.5, (2017), pp.192-199.
  • Downloads

  • How to Cite

    Ahmed, K. N., & Razak, T. A. (2018). Scalable density based spatial clustering with integrated one-class SVM for noise reduction. International Journal of Engineering and Technology, 7(2.9), 28-32. https://doi.org/10.14419/ijet.v7i2.9.10093