Evaluating The Impact of Dimensionality Reduction Techniques for Decision Tree Performance in Multiclass Imbalanced Datasets

  • Authors

    • S. Sridhar Assistant Professor, Department of Computer Science and Engineering (Emerging Technologies), SRM Institute of Science and Technology, Vadapalani campus, Chennai,TN, India
    • Sridevi Srinivasan Assistant Professor, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, TN, India.
    https://doi.org/10.14419/70586478

    Received date: June 18, 2025

    Accepted date: July 7, 2025

    Published date: August 21, 2025

  • Classification; Feature selection; LDA; Multi-class imbalanced datasets; PCA
  • Abstract

    Imbalanced datasets are a common challenge in real-world applications, where the class of interest is often a minority. Addressing class imbalance in multi-class datasets receives less attention compared to binary datasets due to the increased complexity. This complexity arises from varying class frequencies and associated costs. High-dimensional datasets, with numerous features, pose another challenge in machine learning. Feature selection techniques help mitigate dimensionality, improving classifier efficiency in accuracy and computation. These techniques involve creating new features or selecting subsets from the original set. Effective strategies for imbalanced learning aim to retain minority class concepts by leveraging informative features. This study investigates the impact of dimensionality reduction techniques, such as Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA), on multi-class imbalanced datasets using decision trees, a challenge commonly encountered in high-dimensional domains such as bioinformatics and medical diagnostics. While datasets with clear class boundaries may reduce the effectiveness of dimensionality reduction, PCA could be more effective in cases of class overlap, where the majority class has more samples. Experimental results support these conclusions.

  • References

    1. R. Blagus and L. Lusa, “Class prediction for high-dimensional class-imbalanced data”, BMC Bioinformatics, vol. 11, no. 1, (2010), p. 523.
    2. A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets, Springer, (2018), ISBN: 978-3- 319-98073-7.
    3. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection”, J. Mach. Learn. Res., vol. 3, (2003), pp. 1157–1182.
    4. T. R. Hoens, Q. Qian, N. V. Chawla, and Z. H. Zhou, “Building decision trees for the multi-class imbalance problem”, in Advances in Knowledge Discovery and Data Mining, PAKDD 2012, Springer, vol. 7301, (2012), pp. 122–133.
    5. I. T. Jolliffe, Principal Component Analysis, 2nd ed., Springer, New York, (2002).
    6. A. D. Kalian, E. Benfenati, O. J. Osborne, J. L. C. M. Dorne, D. Gott, C. P. Potter, M. Guo, and C. Hogstrand, “Improving accuracy scores of neu-ral networks driven QSAR models of mutagenicity”, in Proc. 33rd Eur. Symp. on Computer Aided Process Engineering (ESCAPE-33), Elsevier, Athens, Greece, (2023), p. 846.
    7. X. Li, Q. Wang, F. Nie, and M. Chen, “Locality Adaptive Discriminant Analysis Framework”, IEEE Trans. Cybern., vol. 52, (2022), pp. 7291–7302.
    8. A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix”, Pattern Recognition, vol. 91, (2019), pp. 216–231.
    9. S. Lusito, A. Pugnana, and R. Guidotti, “Solving imbalanced learning with outlier detection and features reduction”, Mach. Learn., vol. 113, (2024), pp. 5273–5330.
    10. M. A. Mazurowski, P. A. Habas, J. M. Zurada, J. Y. Lo, J. A. Baker, and G. D. Tourassi, “Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance”, Neural Netw., vol. 21, no. 2–3, (2008), pp. 427–436.
    11. R. E. Nogales and M. E. Benalcázar, “Analysis and evaluation of feature selection and feature extraction methods”, Int. J. Comput. Intell. Syst., vol. 16, (2023), p. 153.
    12. T. Parhizkar, E. Rafieipour, and A. Parhizkar, “Evaluation and improvement of energy consumption prediction models using principal component analysis-based feature reduction”, J. Clean. Prod., vol. 279, (2021), p. 123866.
    13. C. R. Rao, “The utilization of multiple measurements in problems of biological classification”, J. R. Stat. Soc. Ser. B (Methodol.), vol. 10, (1948), pp. 159–193.
    14. S. Wang and X. Yao, “Multiclass imbalance problems: analysis and potential solutions”, IEEE Trans. Syst. Man Cybern. B Cybern., vol. 42, no. 4, (2012), pp. 1119–1130.
    15. J. H. Xue and D. M. Titterington, “Do unbalanced data have a negative effect on LDA?”, Pattern Recognition, vol. 41, no. 5, (2008), pp. 1558–1571.
  • Downloads

  • How to Cite

    Sridhar , S., & Srinivasan , S. . (2025). Evaluating The Impact of Dimensionality Reduction Techniques for Decision Tree Performance in Multiclass Imbalanced Datasets. International Journal of Basic and Applied Sciences, 14(4), 602-608. https://doi.org/10.14419/70586478