Analysis of supervised and unsupervised technique for authentication dataset


  • Rahul K. Dubey
  • P. K. Nizar Banu



Traditional methods of data storage vary from the present. These days data has become more unstructured and requires to be read contextually. Data Science provides a platform for the community to perform artificial intelligence and deep learning methodologies on large volumes of structured and unstructured data. In the era of artificial intelligence, AI is showing it’s true potential by addressing social causes and automation in various industries such as automobile, medicine and smart buildings, healthcare, retail, banking, and finance service are some of the deliverables. From a variety of sources and flooding data, AI and machine learning are finding real-world adoption and applications. The nature of the data models is trial and error and is prone to change with their discoveries for the specific problem and this is the case with the different algorithms used. In this paper, we apply machine learning algorithms such as unsupervised learning k-means, bat k-means and supervised learning decision tree, k-NN, support vector machine, regression, discriminant analysis, ensemble classification for data set taken from UCI repository, phishing website, website phishing, Z- Alizadeh Sani and authentication datasets. Authentication dataset is generated for testing Single Sign-on which learns from data by training to make predictions.


[1] D. (Turner), "Digital Authentication - the basics",, 2016. [Online]. Available:

[2] "Authentication Patterns",, 2015. [Online]. Available:

[3] N. Buduma, Fundamentals of Deep Learning, 1st ed. O’Reilly Media, Inc., 2017.

[4] J. Han and M. Kamber, Data mining, second ed. Amsterdam: Elsevier, Morgan Kaufmann, 2006.

[5] T. Hastie, R. Tibshirani and J. Friedman, The elements of statistical learning, 2nd ed. Springer Science & Business Media, 2009, 2009.

[6] "A Note on Distance-Weighted k-Nearest Neighbor Rules", IEEE Transactions on Systems, Man, and Cybernetics, vol. 8, no. 4, pp. 311-313, 1978.

[7] S. Safavian and D. Landgrebe, "A survey of decision tree classifier methodology", IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660-674, 1991.

[8] D. Lewis, "Naive (Bayes) at forty: The independence assumption in information retrieval", Machine Learning: ECML-98, pp. 4-15, 1998.

[9] Ng, A.Y. and Jordan, M.I., 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems (pp. 841-848).

[10] Alizadehsani, R., Habibi, J., Hosseini, M.J., Mashayekhi, H., Boghrati, R., Ghandeharioun, A., Bahadorian, B. and Sani, Z.A., 2013. A data mining approach for diagnosis of coronary artery disease. Computer methods and programs in biomedicine, 111(1), pp.52-61.

[11] Alizadehsani, R., Zangooei, M.H., Hosseini, M.J., Habibi, J., Khosravi, A., Roshanzamir, M., Khozeimeh, F., Sarrafzadegan, N. and Nahavandi, S., 2016. Coronary artery disease detection using computational intelligence methods. Knowledge-Based Systems, 109, pp.187-197.

[12] Abdelhamid, N., Ayesh, A. and Thabtah, F., 2014. Phishing detection based Associative Classification data mining. Expert Systems with Applications, 41(13), pp.5948-5959.

[13] Al-diabat, M., 2016. Detection and Prediction of Phishing Websites using Classification Mining Techniques. International Journal of Computer Applications, 147(5).

[14] Mohammad, R.M., Thabtah, F. and McCluskey, L., 2012, December. An assessment of features related to phishing websites using an automated technique. In Internet Technology and Secured Transactions, 2012 International Conference for (pp. 492-497). IEEE.

[15] Mohammad, R.M., Thabtah, F. and McCluskey, L., 2014. Intelligent rule-based phishing websites classification. IET Information Security, 8(3), pp.153-160.

[16] Mohammad, R.M., Thabtah, F. and McCluskey, L., 2014. Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2), pp.443-458.

[17] Tay, B., Hyun, J.K. and Oh, S., 2014. A machine learning approach for specification of spinal cord injuries using fractional anisotropy values obtained from diffusion tensor images. Computational and mathematical methods in medicine, 2014.

[18] Sharef, N.M., Martin, T., Kasmiran, K.A., Mustapha, A., Sulaiman, M.N. and Azmi-Murad, M.A., 2015. A comparative study of evolving fuzzy grammar and machine learning techniques for text categorization. Soft Computing, 19(6), pp.1701-1714.

[19] Zhou, P.Y. and Chan, K.C., 2014, May. A Model-Based Multivariate Time Series Clustering Algorithm. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 805-817). Springer, Cham.

[20] Davies, D.L. and Bouldin, D.W., 1979. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2), pp.224-227.

[21] Banu, P., Own, H., Olariu, T. and Olariu, I. (2017). Cluster Analysis for European Neonatal Jaundice. Soft Computing Applications, pp.408-419.

View Full Article:

How to Cite

Dubey, R. K., & Banu, P. K. N. (2018). Analysis of supervised and unsupervised technique for authentication dataset. International Journal of Engineering & Technology, 7(4), 2867–2873.