An Ensemble approach to identifying the student gender towards information and communication technology awareness in European schools using machine learning


  • Chaman Verma
  • Veronika Stoffová
  • Zoltán Illés



Data mining and machine learning play an important role in both research estimation and learning. The present study is conducted to identify the gender of student according to their answers given in survey related to information and communication technology (ICT) in European schools. The student dataset which consists of a total number of 156 attributes and 50478 instances are tested to identify student gender. To develop the ensemble predictive model after comparing prediction accuracy achieved by various supervised machine learning classifiers such as Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB), Artificial Neural network (ANN) and J48 tree with various k-fold cross-validation. The K-nearest neighbor (IbK or KNN) is also trained with data-set with varying value of k at 8-fold cross-validation. The dichotomous variable is gender and 131 predictors belong to ICT in education are taken into consideration after applying feature reduction methods. Findings of the study reveal that the maximum prediction is gained by SVM (76%) at each fold as compared to others. The total number (23535) of correct females are identified by RF at 6-fold and correct perdition of males is 14678 which is achieved by SVM at 2-fold. The authors also found lowest accuracy for prediction is achieved by NB classifier at each fold. Finally, the ensemble predictive model is presented by joining the best classifier such as SVM at 2-fold, ANN at 2-fold and RF at 6-fold to accurate identification of student gender over data-set. The ensemble confusion matrix also concludes the maximum prediction of the female student as compared to male student towards their response given to survey.


[1] A. Bonnaccorsi, “On the Relationship between Firm Size and Export Intensityâ€, Journal of International Business Studies, vol. 23, no. 4, pp. 605 – 635, 1992.

[2] G. Boero, “An econometric analysis of student withdrawal and progression in post-reform Italian universitiesâ€, Centro Ricerche Economiche Nord Sud, CRENoS Working Paper, 2005.

[3] C. M. Bishop, “Neural Networks for Pattern Recognitionâ€, New York, NY, USA: Oxford Univ. Press, 1995.

[4] C. E. Brodley and P. E.Utgoff, “Multivariate decision treesâ€, Machine Learning, vol. 19, no. 1, pp. 45 – 77, 1995.

[5] C. J. C. Burges, “A tutorial on support vector machines for pattern recognitionâ€, Data Mining Knowledge Discovery, vol. 2, no. 2, pp.121 – 167, 1998.

[6] C. Romero and S. Ventura, “Educational data mining: A survey from 1995 to 2005â€, Expert .systems with applications, vol. 1, no. 33, pp.135 – 146, 2007.

[7] G., Siemens and P Long, “Penetrating the fog: Analytics in learning and educationâ€, EDUCAUSE Review, vol. 5, no. 46, 2011.

[8] Gerard J.A. and Baarsa “A Model to Predict Student Failure in The First Year of the Undergraduate Medical Curriculumâ€, Health Professions Education, pp.5 – 14, 2017.

[9] Javier Bravo “Exploring the influence of ICT in online students through data mining toolsâ€, eighth International conference on educational data mining, Spain, 2015.

[10] Kotsiantis.S, “Predicting students’ performance in distance learning using machine learning techniquesâ€, Applied Artificial Intelligence, vol.18, pp.411 – 426, 2014.

[11] M. Clerc, “The Swarm and the Queen: Towards a Deterministic and Adaptive Particle Swarm Optimizationâ€, Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp.1951– 1957, 1999.

[12] R. Caves, “Multinational Enterprise and Economic Analysisâ€, Cambridge University Press, Cambridge, 1982.

[13] S. Alghowinem ET. al., “Multimodal depression detection: Fusion analysis of paralinguistic, head pose and eye gaze behaviorsâ€, IEEE Trans. Affective Computing, vol. 1, no. 9, pp. 1 – 14, 2016.

[14] R. Singh and M. Kumar, “Gender Classification Techniques-From Machine Learning to Deep Learningâ€, International Journal of Control Theory and Applications, vol. 9, no. 41, pp.77– 88, 2016.

[15] J. Sara and J. Czaja, “Factors Predicting the Use of Technology: Findings from the Center for Research and Education on Aging and Technology Enhancement (CREATE)â€, Psychol Aging, vol. 21, no. 2, pp.333 –352, 2006.

[16] Salyungu Mabula, “Modeling student performance in mathematics using Binary Logistic Regression at selected secondary schools: A case study of malware municipality and Ilemela districtâ€, Journal of Education and Practice, vol. 6, no. 36, pp. 96 – 103, 2015.

[17] W. Chun, P. and Tzung. Et al., “An Integrated MFFP tree Algorithm for Mining Global Fuzzy Rules from Distributed Databasesâ€, vol. 19, no. 4, pp. 521 – 538, 2013.

[18] ESSIE Survey on EC, Accessed on 14 February 2018.

[19] Chaman Verma, Ahmed S. Tarawneh, Veronika Stoffov´a, Zolt´an Ill´es and Sanjay Dahiya, “Gender prediction of the european school’s teachers using machine learning: Preliminary resultsâ€, 8th IEEE International Advance Computing Conference. IEEE In Press, 2018.

Chaman Verma, Ahmed S. Tarawneh, Veronika Stoffov´a and Zolt´an Ill´es. Forecasting residence state of Indian student based on responses towards information and communication technology awareness: A primarily outcomes using machine learningâ€, International Conference on Innovations

View Full Article: