A Hybrid Feature Selection Technique for Classification of Group-based Holy Quran Verses

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated with wrapper-based FS techniques and low classification accuracy performance associated with filter-based FS techniques. In this paper, a hybrid feature selection technique is proposed. The proposed hybrid technique is a combination of filter-based information gain (IG) and wrapper-based CFS algorithms. The specific purpose for this combination is to achieve both high classification accuracy performance (associated with wrapper) at lower computational runtime (associated with filter). The proposed IG-CFS technique is then applied to label Quranic verses of al-Baqara and al-Anaam from two major references, the English translation and commentary (tafsir). StringToWordVector with weighted TF-IDF method were used for preprocessing the textual data while four classifiers: naïve bayes, libSVM, k-NN, and decision trees (J48) were experimented. The overall highest classification accuracy of 94.5% was achieved at 3.89secs runtime with the proposed IG-CFS technique.



  • Keywords

    Feature Selection Techniques; Holy Quran; Text Classification Algorithms; AUC; ROC Curve

  • References

      [1] Adeleke AO, Samsudin NA, Mustapha A & Nawi NM (2017), Comparative Analysis of Text Classification Algorithms for Automated Labelling of Quranic Verses. Int. J. on Advance Science, Engineering and Info. Tech. 7, 1419-1427.

      [2] Das S, Dey A, Pal A & Roy N (2015), Applications of Artificial Intelligence in Machine Learning: Review and Prospect. J. of Computer Applications 115, 31-41.

      [3] Talwar A & Kumar Y (2013), Machine Learning: An Artificial Intelligence Methodology. J. of Engineering and Computer Science 2, 3400-3404.

      [4] Tang J, Alelyani S & Lin H (2014), Feature Selection for Classification: A Review. In Data Classification: Algorithms and Applications. CRC Press.

      [5] Adeleke AO, Samsudin NA, Mustapha A & Nawi NM (2018), A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses. in R. Ghazali et al. (eds.), Recent Advances on Soft Computing and Data Mining, Advances in Intelligent Systems and Computing 700, 549, 282-297.

      [6] Jamil NS, Ku-mahamud KR, Din AM, Ahmad F, Chepa N, Ishak WHW, Din R & Ahmad FK (2017), A subject identification method based on term frequency technique. J. of Advanced Computer Research 7, 103-110.

      [7] Goudjil M, Bedda M, Koudil M, & Ghoggali N (2015), Using Active Learning in Text Classification of Quranic Sciences. Int. Conf. on Advances in Information Technology for the Holy Quran and Its Sciences, 209-213.

      [8] Hassan GS, Mohammad SK & Alwan FM (2015), Categorization of Holy Quran Tafseer’ using k-Nearest Neighbour Algorithm. Int. J. of Computer Applications, 129, 1-6.

      [9] Ibrahim EAA, Ataelfadiel MAM & Atwel ES (2017), Provisions of Quran Tajweed Ontology (Articulations Points of Letters, UN Vowel Noon and Tanween). Int. J. of Science and Research, 6, 8, 756-761.

      [10] Alqahtani M & Atwell E (2016), Arabic Quranic Search Tool Based on Ontology. 21st Int. Conf. on Applications of Natural Language to Information Systems, 478-485.

      [11] Hamed SK & Ab Aziz MJ (2016), A Question Answering System on Holy Quran Translation Based on Question Expansion Technique and Neural Network Classification. J. of Computer Sciences, 12, 3, 169-177.

      [12] Abdelnasser H, Mohamed R, Ragab M, Mohamed A, Farouk B & El-Makky N (2014), Al-Bayan: An Arabic Question Answering System for the Holy. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing, 57-64.

      [13] Alrehaili SM & Atwell E (2014), Computational Ontologies for Semantic tagging of the Quran: A survey of past approaches. Ninth Int. Conf. on Language Resources and Evaluation.

      [14] Abdelhamid Y, Mahmoud M & El-Sakka TM (2013), Using Ontology for Associating Web Multimedia Resources with the Holy Quran. Taibah University Int. Conf. on Advances in Information Technology for the Holy Quran and its Sciences, 266-271.

      [15] Akkila AN & Abu Naser SS (2017), Teaching the right letter pronunciation in reciting the holy Quran using intelligent tutoring system. Int. J. of Advanced Research and Development, 2, 1, 64-68.

      [16] Ahmed AH & Abdo SM (2017), Verification System of Quran Recitation Recordings. Int. J. of Computer Applications, 163, 4, 6-11.

      [17] Aljaloud HO, Dahab M & Kamal M (2016), Stemmer Impact on Quranic Mobile Information Retrieval Performance. Int. J. of Advanced Computer Science and Applications, 7, 12, 135-139.

      [18] Zharmagambetov AS & Pak AA (2015), Sentiment analysis of document using deep learning and decision trees. Twelve IEEE Int. Conf. on Electronics Computer and Computation, 1-4.

      [19] Wang JH & Wang HY (2014), Incremental Neural Network Construction for Text Classification. IEEE Int. Symposium on Computer Consumer and Control, 970-973.

      [20] Sabbah T & Selamat A (2014), Support Vector Machine based approach for Quranic words detection in online textual content. 8th IEEE Malaysian Software Engineering Conference, Malaysia, 325-330.

      [21] Townsend KR, Sun S, Johson T, Attia OG, Jones PH, and Zambreno J (2015), k-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator. IEEE Int. Conf. on Electro/Information Technology, 257-263.

      [22] Aladeemy M, Tutun S & Khasawneh MT (2017), A new hybrid approach for feature selection and support vector machine model selection based on self-adaptive cohort intelligence. Expert Systems with Applications, 88, 118-131.

      [23] Wang H & Liu S (2016), An Effective Feature Selection Approach Using the Hybrid Filter Wrapper. Int. J. of Hybrid Information Technology, 9, 1, 119-128.

      [24] Uysal AK (2016), An improved global feature selection scheme for text classification. Expert Systems with Applications, 43, 82-92.

      [25] Ghareb AS, Abu Bakar A & Hamdan AR (2016), Hybrid feature selection based on enhanced genetic algorithm for text categorization,” Expert Systems with Applications, 49, 31-47.

      [26] Hancer E, Xue B & Zhang M (2017), Differential evolution for filter feature selection based on information theory and feature ranking,” Knowledge-Based Systems, 000, 1-17.

      [27] Feng PM, Ding H, Chen W & Lin H (2015), Naive Bayes Classifier with Feature Selection to Identify Phage Viron Proteins. Computational and Mathematical Methods in Medicine.

      [28] Pashaei E & Aydin N (2017), Binary black hole algorithm for feature selection and classification on biological data. Applied soft computing, 56, 94-106.

      [29] Novakovic J (2009), Using Information Gain Attribute Evaluation to classify Sonar Targets. 17th Telecommunications Forum, 1351-1354.

      [30] Zhuo L, Zheng J, Wang F, Li X, Ai B & Qian J (2008), A Genetic Algorithm based Wrapper Feature Selection method for Classification of Hyperspectral Images using Support Vector Machine, The Int. Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XXXVII, 397-402.

      [31] Veeraswamy A & Balamurugan SA (2013), An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm, Proceedings of the National Conf. on Recent Trends in Mathematical Computing, 427-431.

      [32] Mansoori TK, Suman A & Mishra SK (2014), Feature Selection by Genetic Algorithm and SVM Classification for Cancer Detection, Int. J. of Advanced Research in Computer Science and Software Engineering, 4, 357-365.

      [33] Molano V, Cobos C, Mendoza M, Viedina EH & Manic M (2011), Feature Selection based on sampling and C4.5 Algorithm to improve the Quality of Text Classification using Naïve Bayes, Springer.




Article ID: 23372
DOI: 10.14419/ijet.v7i4.31.23372

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.