Analyzing the predictive capacity of various machine learning algorithms

  • Authors

    • Soly Mathew Biju UOWD
    2018-08-23
    https://doi.org/10.14419/ijet.v7i2.27.11013
  • Generalized Linear Models (GLM), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forests (RF), Machine Learning Algorithms.
  • The purpose of this study is to deploy and evaluate the performance of the new age machine learning algorithms and their applicability in business environment. Three unique set of datasheets were used to evaluate the true performance of top 4 machine learning algorithms – i.e. Generalized Linear Models (GLM), Support Vector Machine (SVM), K-nearest neighbor (KNN) and Random Forests. The findings of this study revealed that although these algorithms take different way of solving classification and regression problems, they develop quite robust models by understanding and learning the hidden patterns in the datasets. The findings of this study can be used by other companies and individuals while analyzing and solving their respective business problems. Although a number of studies exist where new-age machine learning algorithms are tested and evaluated, there are none where the performance of these algorithms was tested on different size and type of datasets.

     

     

  • References

    1. [1] MathWorks (2016) Section 1. What is Machine Learning, Machine Learning with MATLAB. doi: 10.1111/j.2041- 210X.2010.00056.x.

      [2] Nivedhitha, G., J, C. M. B. M. and Rupavathy, N. (2018) ‘Phishing websites blacklisting using machine learning algorithms’, International Journal of Engineering & Technology 7, pp. 179–181.

      [3] Mccue, T. et al. (2008) Evaluation of generalized linear model assumptions using randomization.

      [4] Fahrmeir, L. and Tutz, G. (2001) Multivariate Statistical Modelling Based on Generalized Linear Models, Springer Series in Statistics. Edited by L. Fahrmeir. Springer.

      [5] Biau, G. (2012) ‘Analysis of a random forests model’, Journal of Machine Learning Research, 13(Apr), pp. 1063–1095.

      [6] Kumar, M. K., Sreedevi, M. and Reddy, Y. C. A. P. (2018) ‘Survey on machine learning algorithms for liver disease diagnosis and prediction’, International Journal of Engineering & Technology,7, pp. 99–102. https://doi.org/10.14419/ijet.v7i1.8.9981.

      [7] Ragupathy, R. and Maguluri, L. P. (2018) ‘Comparative analysis of machine learning algorithms on social media test’, International Journal of Engineering & Technology, 7, pp. 284–290. https://doi.org/10.14419/ijet.v7i2.8.10425.

      [8] Biju SM, A Mathew (2017) ‘ Comparative Analysis of Big Data Analytics Software in Assessing Sample Data ‘ Journal of International Technology and Information Management: Vol. 26 : Iss. 2.

      [9] Suguna, N. and Thanushkodi, K. (2010) ‘An Improved k- Nearest Neighbor Classification Using Genetic Algorithm’, International Journal of Computer Science Issues, 7(4), pp. 18–21.

      [10] Sangeetha, M., Nithyanantham, S. and Jayanthi, M. (2018) ‘Comparison of twitter spam detection using various machine learning algorithms’, International Journal of Engineering & Technology, 7, pp. 61–65. https://doi.org/10.14419/ijet.v7i1.3.9268.

      [11] Devroye, L. (1981) "On the equality of Cover and Hart in nearest neighbor discrimination", IEEE Trans. Pattern Anal. Mach. lntell. 3: 75- 78. https://doi.org/10.1109/TPAMI.1981.4767052.

      [12] Devroye, L., Gyorfi, L., Krzyzak, A. & Lugosi, G. (1994) "On the strong universal consistency of nearest neighbor regression function estimates", Ann. Statist, 22: 1371– 1385. https://doi.org/10.1214/aos/1176325633.

      [13] Devroye, L. & Wagner, T.J. (1977) "The strong uniform consistency of nearest neighbor density estimates", Ann. Statist., 5: 536–540. https://doi.org/10.1214/aos/1176343851.

      [14] Devroye, L. & Wagner, T.J. (1982) "Nearest neighbor methods in discrimination, In Classification, Pattern Recognition and Reduction of Dimensionality", Handbook of Statistics, 2: 193–197. North-Holland, Amsterdam. https://doi.org/10.1016/S0169-7161(82)02011-2.

      [15] Busuttil, S. (2004) ‘Support Vector Machines’, in Computer Science Annual Research Workshop (CSAW03). Msida: University of Malta, pp. 34–39.

      [16] Noi, P. T. and Kappas, M. (2018) ‘Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using sentinel-2 imagery’, Sensors, 18(1), pp. 1–20. https://doi.org/10.3390/s18010018.

      [17] Chang, H. and Astolfi, A. (2011) ‘Gaussian Based Classification with Application to the Iris Data Set’, IFAC Proceedings Volumes, 44(1), pp. 14271–14276. https://doi.org/10.3182/20110828-6-IT-1002.02644.

      [18] Zhai, H. and Wiktorsson, M. (2015) Linear And Non- Linear Regression: Application To Competitor’s Gasoline Volume Estimation. Lund University.

      [19] Peí, R. (2015) ‘Metrics for evaluation of student models , 7(2), p. 19.

      [20] Razia, S. et al. (2018) ‘A Comparative study of machine learning algorithms on thyroid disease prediction’, International Journal of Engineering & Technology ,7, pp. 315–319. https://doi.org/10.14419/ijet.v7i2.8.10432.

      [21] Goldburd, M., Khare, A. and Tevet, D. (2016) Generalized linear models for insurance rating. Arlington County.

      [22] Nitze, I., Schulthess, U. and Asche, H. (2012) ‘Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification’, in Proceedings of the 4th GEOBIA. Postdam, Berlin: University of Postdam, p. 6.

      [23] Raikwal, J. S. and Saxena, K. (2012) ‘Performance evaluation of svm and k-nearest neighbor algorithm over medical data set’, International Journal of Computer Applications, 50(14), pp. 975–8887.

  • Downloads

  • How to Cite

    Mathew Biju, S. (2018). Analyzing the predictive capacity of various machine learning algorithms. International Journal of Engineering & Technology, 7(2.27), 266-270. https://doi.org/10.14419/ijet.v7i2.27.11013