Analyzing the predictive capacity of various  machine learning algorithms

Soly Mathew Biju

doi:10.14419/ijet.v7i2.27.11013

Authors

Soly Mathew Biju
UOWD

Received date: April 3, 2018

Accepted date: May 24, 2018

Published date: August 23, 2018

DOI:

https://doi.org/10.14419/ijet.v7i2.27.11013

Keywords:

Generalized Linear Models (GLM), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forests (RF), Machine Learning Algorithms.

Abstract

The purpose of this study is to deploy and evaluate the performance of the new age machine learning algorithms and their applicability in business environment. Three unique set of datasheets were used to evaluate the true performance of top 4 machine learning algorithms â€“ i.e. Generalized Linear Models (GLM), Support Vector Machine (SVM), K-nearest neighbor (KNN) and Random Forests. The findings of this study revealed that although these algorithms take different way of solving classification and regression problems, they develop quite robust models by understanding and learning the hidden patterns in the datasets. The findings of this study can be used by other companies and individuals while analyzing and solving their respective business problems. Although a number of studies exist where new-age machine learning algorithms are tested and evaluated, there are none where the performance of these algorithms was tested on different size and type of datasets.
Â
Â

References

[1] MathWorks (2016) Section 1. What is Machine Learning, Machine Learning with MATLAB. doi: 10.1111/j.2041- 210X.2010.00056.x.
[2] Nivedhitha, G., J, C. M. B. M. and Rupavathy, N. (2018) â€˜Phishing websites blacklisting using machine learning algorithmsâ€™, International Journal of Engineering & Technology 7, pp. 179â€“181.
[3] Mccue, T. et al. (2008) Evaluation of generalized linear model assumptions using randomization.
[4] Fahrmeir, L. and Tutz, G. (2001) Multivariate Statistical Modelling Based on Generalized Linear Models, Springer Series in Statistics. Edited by L. Fahrmeir. Springer.
[5] Biau, G. (2012) â€˜Analysis of a random forests modelâ€™, Journal of Machine Learning Research, 13(Apr), pp. 1063â€“1095.
[6] Kumar, M. K., Sreedevi, M. and Reddy, Y. C. A. P. (2018) â€˜Survey on machine learning algorithms for liver disease diagnosis and predictionâ€™, International Journal of Engineering & Technology,7, pp. 99â€“102. https://doi.org/10.14419/ijet.v7i1.8.9981.
[7] Ragupathy, R. and Maguluri, L. P. (2018) â€˜Comparative analysis of machine learning algorithms on social media testâ€™, International Journal of Engineering & Technology, 7, pp. 284â€“290. https://doi.org/10.14419/ijet.v7i2.8.10425.
[8] Biju SM, A Mathew (2017) â€˜ Comparative Analysis of Big Data Analytics Software in Assessing Sample Data â€˜ Journal of International Technology and Information Management: Vol. 26 : Iss. 2.
[9] Suguna, N. and Thanushkodi, K. (2010) â€˜An Improved k- Nearest Neighbor Classification Using Genetic Algorithmâ€™, International Journal of Computer Science Issues, 7(4), pp. 18â€“21.
[10] Sangeetha, M., Nithyanantham, S. and Jayanthi, M. (2018) â€˜Comparison of twitter spam detection using various machine learning algorithmsâ€™, International Journal of Engineering & Technology, 7, pp. 61â€“65. https://doi.org/10.14419/ijet.v7i1.3.9268.
[11] Devroye, L. (1981) "On the equality of Cover and Hart in nearest neighbor discrimination", IEEE Trans. Pattern Anal. Mach. lntell. 3: 75- 78. https://doi.org/10.1109/TPAMI.1981.4767052.
[12] Devroye, L., Gyorfi, L., Krzyzak, A. & Lugosi, G. (1994) "On the strong universal consistency of nearest neighbor regression function estimates", Ann. Statist, 22: 1371â€“ 1385. https://doi.org/10.1214/aos/1176325633.
[13] Devroye, L. & Wagner, T.J. (1977) "The strong uniform consistency of nearest neighbor density estimates", Ann. Statist., 5: 536â€“540. https://doi.org/10.1214/aos/1176343851.
[14] Devroye, L. & Wagner, T.J. (1982) "Nearest neighbor methods in discrimination, In Classification, Pattern Recognition and Reduction of Dimensionality", Handbook of Statistics, 2: 193â€“197. North-Holland, Amsterdam. https://doi.org/10.1016/S0169-7161(82)02011-2.
[15] Busuttil, S. (2004) â€˜Support Vector Machinesâ€™, in Computer Science Annual Research Workshop (CSAW03). Msida: University of Malta, pp. 34â€“39.
[16] Noi, P. T. and Kappas, M. (2018) â€˜Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using sentinel-2 imageryâ€™, Sensors, 18(1), pp. 1â€“20. https://doi.org/10.3390/s18010018.
[17] Chang, H. and Astolfi, A. (2011) â€˜Gaussian Based Classification with Application to the Iris Data Setâ€™, IFAC Proceedings Volumes, 44(1), pp. 14271â€“14276. https://doi.org/10.3182/20110828-6-IT-1002.02644.
[18] Zhai, H. and Wiktorsson, M. (2015) Linear And Non- Linear Regression: Application To Competitorâ€™s Gasoline Volume Estimation. Lund University.
[19] PeÃ, R. (2015) â€˜Metrics for evaluation of student models , 7(2), p. 19.
[20] Razia, S. et al. (2018) â€˜A Comparative study of machine learning algorithms on thyroid disease predictionâ€™, International Journal of Engineering & Technology ,7, pp. 315â€“319. https://doi.org/10.14419/ijet.v7i2.8.10432.
[21] Goldburd, M., Khare, A. and Tevet, D. (2016) Generalized linear models for insurance rating. Arlington County.
[22] Nitze, I., Schulthess, U. and Asche, H. (2012) â€˜Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classificationâ€™, in Proceedings of the 4th GEOBIA. Postdam, Berlin: University of Postdam, p. 6.
[23] Raikwal, J. S. and Saxena, K. (2012) â€˜Performance evaluation of svm and k-nearest neighbor algorithm over medical data setâ€™, International Journal of Computer Applications, 50(14), pp. 975â€“8887.

Analyzing the predictive capacity of various machine learning algorithms

Authors

Soly Mathew Biju

How to Cite

DOI:

Keywords:

Abstract

References

Downloads

How to Cite