Decision Trees for the Early Identification of University Students at Risk of Desertion


  • Mayra Albán
  • David Mauricio
  • . .





Prediction of college desertion, machine learning, decision trees, CHAID.


The student's dropout at the universities is a topic that has generated controversy in Higher Education Institutions. It has negative effects which cause problems in the social, academic and economic context of the students. One of the alternatives used to predict the dropout at the universities is the implementation of machine learning techniques such as decision trees, known as prediction models that use logical construction diagrams to characterize the behavior of students and identify early students that at in risk of leaving university. Based on a survey of 3162 students, it was possible to obtain 10 variables that have influence into the dropout, that’s why, a CHAID decision tree model is proposed that presents the 97.95% of the accuracy in the prediction of the university students’ dropout. The proposed prediction model allows the administrators of the universities developing strategies for effective intervention in order to establish actions that allow students finishing their university careers successful.




[1] C. Ye and G. Biswas, "Early prediction of student dropout and performance in MOOCs using higher granularity temporal information," Journal of Learning Analytics, vol. 1, pp. 169-172, 2014.

[2] G. S. Abu-Oda and A. M. El-Halees, "Data mining in higher education: university student dropout case study," International Journal of Data Mining & Knowledge Management Process, vol. 5, p. 15, 2015.

[3] Archambault, M. Janosz, V. Dupéré, M. C. Brault, and M. M. Andrew, "Individual, social, and family factors associated with high school dropout among lowâ€SES youth: Differential effects as a function of immigrant status," British Journal of Educational Psychology, vol. 87, pp. 456-477, 2017.

[4] C. R. Hoyt, "The Impact of the Tax Reform Act of 1986 on Legal Education and Law Faculty," J. Legal Educ., vol. 36, p. 568, 1986.

[5] A. V. D. López, " Strategies to overcome university desertion," Educación y educadores, vol. 7, pp. 177-203, 2004.

[6] C. Vogel, J. Hochberg, S. Hackstein, A. Bockshecker, T. J. Bastiaens, and U. Baumöl, "Dropout in Distance Education and how to Prevent it," in EdMedia+ Innovate Learning, 2018, pp. 1788-1799.

[7] G. M. Alarcon and J. M. Edwards, "Ability and motivation: Assessing individual factors that contribute to university retention," Journal of Educational Psychology, vol. 105, p. 129, 2013.

[8] F. Roso-Bas, A. P. Jiménez, and E. García-Buades, "Emotional variables, dropout and academic performance in Spanish nursing students," Nurse education today, vol. 37, pp. 53-58, 2016.

[9] C. Márquez-Vera, C. R. Morales, and S. V. Soto, "Predicting school failure and dropout by using data mining techniques," IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, vol. 8, pp. 7-14, 2013.

[10] S. Herzog, "Measuring determinants of student return vs. dropout/stopout vs. transfer: A first-to-second year analysis of new freshmen," Research in higher education, vol. 46, pp. 883-928, 2005.

[11] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, "Supervised machine learning: A review of classification techniques," Emerging artificial intelligence applications in computer engineering, vol. 160, pp. 3-24, 2007.

[12] E. Yukselturk, S. Ozekes, and Y. K. Türel, "Predicting dropout student: an application of data mining methods in an online education program," European Journal of Open, Distance and E-learning, vol. 17, pp. 118-133, 2014.

[13] Y.-H. Hu, C.-L. Lo, and S.-P. Shih, "Developing early warning systems to predict students’ online learning performance," Computers in Human Behavior, vol. 36, pp. 469-478, 2014.

[14] D. Thammasiri, D. Delen, P. Meesad, and N. Kasap, "A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition," Expert Systems with Applications, vol. 41, pp. 321-330, 2014.

[15] D. Yasmin, "Application of the classification tree model in predicting learner dropout behaviour in open and distance learning," Distance Education, vol. 34, pp. 218-231, 2013.

[16] M. Kloft, F. Stiehler, Z. Zheng, and N. Pinkwart, "Predicting MOOC dropout over weeks using machine learning methods," in Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 2014, pp. 60-65.

[17] M. Tan and P. Shao, "Prediction of student dropout in e-Learning program through the use of machine learning method," International Journal of Emerging Technologies in Learning (iJET), vol. 10, pp. 11-17, 2015.

[18] L. Aulck, N. Velagapudi, J. Blumenstock, and J. West, "Predicting student dropout in higher education," arXiv preprint arXiv:1606.06364, 2016.

[19] W. Xing, X. Chen, J. Stein, and M. Marcinkowski, "Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization," Computers in Human Behavior, vol. 58, pp. 119-129, 2016.

[20] S. Natek and M. Zwilling, "Student data mining solution–knowledge management system related to higher education institutions," Expert systems with applications, vol. 41, pp. 6400-6407, 2014.

[21] N. Lam-On and T. Boongoen, "Using cluster ensemble to improve classification of student dropout in Thai university," in Soft Computing and Intelligent Systems (SCIS), 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), 15th International Symposium on, 2014, pp. 452-457.

[22] W. Li, M. Gao, H. Li, Q. Xiong, J. Wen, and Z. Wu, "Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning," in Neural Networks (IJCNN), 2016 International Joint Conference on, 2016, pp. 3130-3137.

[23] A.-S. Hoffait and M. Schyns, "Early detection of university students with potential difficulties," Decision Support Systems, vol. 101, pp. 1-11, 2017.

[24] A. K. Pal and S. Pal, "Analysis and mining of educational data for predicting the performance of students," International Journal of Electronics Communication and Computer Engineering, vol. 4, pp. 1560-1565, 2013.

[25] M. Fei and D.-Y. Yeung, "Temporal models for predicting student dropout in massive open online courses," in Data Mining Workshop (ICDMW), 2015 IEEE International Conference on, 2015, pp. 256-263.

[26] S. Sultana, S. Khan, and M. A. Abbas, "Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts," International Journal of Electrical Engineering Education, vol. 54, pp. 105-118, 2017.

[27] A. Sangodiah,P. Bleleya, M. Muniandy, L. E. Heng, and C. Ramendran SPR, "Minimizing Student attrition in Higher Learning Institutios in Malaysia Using Support Vector Machine", " Journal of Theoretical & Applied Information Technology, vol. 71, 2015.

[28] M. A. AL-Barrak and M. S. AL-Razgan, "Predicting Student´ Performance trough Classification: A case study," Journal of Theoretical & Applied Information Technology, vol. 75, 2015.

View Full Article:

How to Cite

Albán, M., Mauricio, D., & ., . (2018). Decision Trees for the Early Identification of University Students at Risk of Desertion. International Journal of Engineering & Technology, 7(4.44), 51–54.
Received 2019-01-31
Accepted 2019-01-31
Published 2018-12-01