An Improved Ensemble Based Technique for Handling Noisy ‎Class Imbalnced Education Data for Prediction of Students ‎Dropout in Hei

  • Authors

    • Mr. S. Sangeetha Research Scholar, Department of Computer Science, Dr. SNS Rajalakshmi College of Arts & Science,‎ Coimbatore
    • Dr. S. Shanmugapriya Associate Professor & Head, Department of Computer Applications (PG), Dr. SNS Rajalakshmi College of Arts & Science, ‎Coimbatore
    https://doi.org/10.14419/z3vc6t60

    Received date: May 28, 2025

    Accepted date: June 27, 2025

    Published date: July 8, 2025

  • Educational Data Mining (EDM); Student Performance; Imbalanced Data; Class Imbalance; Oversampling; Prediction, and Ensemble Model‎.
  • Abstract

    Increasingly, the sector of education is becoming more interested in the creation of intelligent technology. The fast rise of educational data ‎suggests that standard processing methods may have limitations and may even result in distortion. It is for this reason that the process of ‎reconstructing the research technique of data mining in the field of education has become increasingly important. The amount of information ‎about students that is stored in educational databases is growing on a daily basis; thus, the information that is extracted from these databases ‎needs to be updated on a consistent basis. In a scenario in which there is a requirement to manage a constant flow of student data, there is a ‎challenge of figuring out how to manage this enormous volume of data into the information and how to adapt new knowledge that is ‎introduced with the new data. When working with classes that have few instances, a class imbalance issue is crucial. The machine learning ‎classification of classes is significantly impacted by noisy, class-unbalanced datasets. This research proposes an enhanced hybrid bag-boost ‎model using a suggested resampling technique. A suggested resampling method for addressing noisy, unbalanced datasets is included in this ‎model. The suggested resampling method includes Edited Nearest Neighbor (ENN) and K-Means SMOTE (Synthetic Minority ‎Oversampling Technique) as an oversampling method. The technique of Undersampling is employed to eliminate noise. Three levels of ‎noise reduction are achieved with this resampling technique: first, datasets are clustered using the K-Means clustering technique; second, ‎imbalance is handled by SMOTE inside clusters, which introduces synthetic instances of the class in the minority; and third, instances that ‎generate noise are removed using the ENN technique. The suggested model outperforms the others, according to experimental data. ‎Furthermore, it has been verified that the suggested method works better in binary unbalanced datasets when the noise proportion is raised‎.

  • References

    1. Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. In-ternational Journal of Database Theory and Application, 9(8), 119–136. https://doi.org/10.14257/ijdta.2016.9.8.13.
    2. Aggarwal, D., Mittal, S., & Bali, V. (2021). Significance of non-academic parameters for predicting student performance using ensemble learning techniques. International Journal of System Dynamics Applications (IJSDA), 10(3), 38–49. https://doi.org/10.4018/IJSDA.2021070103.
    3. Pandey, M., &Taruna, S. (2018). An ensemble-based decision support system for the students’ academic performance prediction. In ICT Based In-novations (pp. 163-169). Springer, Singapore. https://doi.org/10.1007/978-981-10-6602-3_16.
    4. Devasia, T., Vinushree, T. P., &Hegde, V. (2016, March). Prediction of students performance using Educational Data Mining. In 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE) (pp. 91-95). IEEE. https://doi.org/10.1109/SAPIENCE.2016.7684167.
    5. Adekitan, A. I., &Salau, O. (2020). Toward an improved learning process: the relevance of ethnicity to data mining prediction of students’ perfor-mance. SN Applied Sciences, 2(1), 1-15. https://doi.org/10.1007/s42452-019-1752-1.
    6. Shingari, I., Kumar, D., & Khetan, M. (2017). A review of applications of data mining techniques for prediction of students’ performance in higher education. Journal of Statistics and Management Systems, 20(4), 713-722. https://doi.org/10.1080/09720510.2017.1395191.
    7. Han, M., Tong, M., Chen, M., Liu, J., & Liu, C. (2017, July). Application of Ensemble Algorithm in Students' Performance Prediction. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (pp. 735-740). IEEE. https://doi.org/10.1109/IIAI-AAI.2017.73.
    8. Livieris, I. E., Drakopoulou, K., Mikropoulos, T. A., Tampakas, V., &Pintelas, P. (2018). An ensemble-based semi-supervised approach for predict-ing students’ performance. In Research on e-Learning and ICT in Education (pp. 25-42). Springer, Cham. https://doi.org/10.1007/978-3-319-95059-4_2.
    9. Rao, B. M., & Murthy, B. R. (2020). Prediction of student’s educational performance using machine learning techniques. In Data Engineering and Communication Technology (pp. 429-440). Springer, Singapore. https://doi.org/10.1007/978-981-15-1097-7_36.
    10. Ade, R. (2019). Students performance prediction using hybrid classifier technique in incremental learning. International Journal of Business Intelli-gence and Data Mining, 15(2), 173-189. https://doi.org/10.1504/IJBIDM.2019.101255.
    11. Kumari, P., Jain, P. K., &Pamula, R. (2018, March). An efficient use of ensemble methods to predict students academic performance. In 2018 4th International Conference on Recent Advances in Information Technology (RAIT) (pp. 1-6). IEEE. https://doi.org/10.1109/RAIT.2018.8389056.
    12. Pandey, M., &Taruna, S. (2014). A comparative study of ensemble methods for students' performance modeling. International Journal of Computer Applications, 103(8). https://doi.org/10.5120/18095-9151.
    13. Hassan, H., Anuar, S., & Ahmad, N. B. (2019, May). Students’ performance prediction model using meta-classifier approach.In International Con-ference on Engineering Applications of Neural Networks (pp. 221-231). Springer, Cham. https://doi.org/10.1007/978-3-030-20257-6_19.
    14. Ajibade, S. S. M., Ahmad, N. B. B., &Shamsuddin, S. M. (2019, August). Educational data mining: enhancement of student performance model using ensemble methods. In IOP Conference Series: Materials Science and Engineering (Vol. 551, No. 1, p. 012061). IOP Publishing. https://doi.org/10.1088/1757-899X/551/1/012061.
    15. Nespereira, C. G., Elhariri, E., El-Bendary, N., Vilas, A. F., & Redondo, R. P. D. (2016). Machine learning based classification approach for pre-dicting students performance in blended learning. In The 1st International Conference on Advanced Intelligent System and Informatics (AI-SI2015), November 28-30, 2015, BeniSuef, Egypt (pp. 47-56). Springer, Cham. https://doi.org/10.1007/978-3-319-26690-9_5.
    16. Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education. https://doi.org/10.1108/JARHE-09-2017-0113.
    17. Abdullah, D. (2020). A linear antenna array for wireless communications. National Journal of Antennas and Propagation, 2(1), 19–24. https://doi.org/10.31838/NJAP/02.01.04.
    18. Barhoumi, E. M., Charabi, Y., & Farhani, S. (2024). Detailed guide to machine learning techniques in signal processing. Progress in Electronics and Communication Engineering, 2(1), 39–47.
    19. Parizi, L., Dobrigkeit, J., & Wirth, K. (2025). Trends in software development for embedded systems in cyber-physical systems. SCCTS Journal of Embedded Systems Design and Applications, 2(1), 57–66.
  • Downloads

  • How to Cite

    Sangeetha, M. S. ., & Shanmugapriya , D. S. . (2025). An Improved Ensemble Based Technique for Handling Noisy ‎Class Imbalnced Education Data for Prediction of Students ‎Dropout in Hei. International Journal of Basic and Applied Sciences, 14(SI-1), 258-263. https://doi.org/10.14419/z3vc6t60