Categorization Arabic Text Using SVM and KNN Algorithms

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Content arrangement is a strategy for marking regular dialect writings with one or a few classifications from a predefined set. Two calculations, to be specific, bolster vector machine (SVM) and k-closest neighbor (KNN), are utilized to examine Arabic content order (TC). Distinctive Arabic datasets are utilized to analyze the two calculations. This examination has been intended to order extraordinary Arabic content. Result demonstrates that TC by means of the SVM calculation beats TC by means of KNN regarding all measures.

     

     


  • Keywords


    text classification (TC), Support Vector Machine (SVM), K–Nearest Neighbor (KNN).

  • References


      [1] Joachims T. (1999). Transductive Inference for Text Classification using Support Vector Machines. Proceedings of the International Conference on Machine Learning (ICML), (pp.200-209).1999.

      [2] Quinlan, J. "C4.5: Programs for machine learning,". San Mateo, CA: Morgan Kaufmann,1993.

      [3] Duwairi, R. (2007). Arabic Text Categorization. Int. Arab J. Inf. Technol. Retrieved from

      [4] Harrag, F., Al-Salman, A. S., & BenMohammed, M. (2010). A comparative study of Neural networks architectures on Arabic text categorization using feature extraction. In Machine and Web Intelligence (ICMWI), 2010 International Conference on (pp. 102–107). IEEE.

      [5] Shakeel PM, Manogaran G., “Prostate cancer classification from prostate biomedical data using ant rough set algorithm with radial trained extreme learning neural network”, Health and Technology, 2018:1-9.https://doi.org/10.1007/s12553-018-0279-6

      [6] Mohammed J. Bawaneh, M. S. A. and A. I. (2008). Arabic Text Classification using K-NN and Naive Bayes. Journal of Computer Science 4, 600–605.

      [7] Laila K. "Arabic Text Classification Using N- Gram Frequency Statistics A Comparative Study,"DMIN, 2006, pp.78-82.

      [8] Han, E., Karypis, G., & Kumar, V. (2001). Text categorization using weight adjusted k-nearest neighbor classification.

      [9] Preeth, S.K.S.L., Dhanalakshmi, R., Kumar, R.,Shakeel PM.An adaptive fuzzy rule based energy efficient clustering and immune-inspired routing protocol for WSN-assisted IoT system.Journal of Ambient Intelligence and Humanized Computing.2018:1–13. https://doi.org/10.1007/s12652-018-1154-z

      [10] Mesleh, A. A. "Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System," Journal of Computer Science (3:6), 2007, pp. 430-435.

      [11] Thabtah F., Hadi W., Al-shammare G. (2008) VSMs with K-Nearest Neighbour to Categorise Arabic Text Data. In The World Congress on Engineering and Computer Science 2008. (pp.778-781), 22-44 October 2008.

      [12] Thabtah F., Eljinini M., Zamzeer M., Hadi W. (2009) Naïve Bayesian based on Chi Square to Categorize Arabic Data. In proceedings of The 11th International Business Information Management Association Conference (IBIMA) Conference on Innovation and Knowledge Management in Twin Track Economies, Cairo, Egypt 4 - 6 January. (pp. 930-935).

      [13] Abdelwadood Moh'd A MESLEH. "Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System". Journal of Computer Science 3(6). Pages 430-435.2007.

      [14] Shakeel, P.M., Tolba, A., Al-Makhadmeh, Zafer Al-Makhadmeh, Mustafa Musa Jaber, “Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks”, Neural Computing and Applications,2019,pp1-14.https://doi.org/10.1007/s00521-018-03972-2

      [15] Sebastiani, .F "A Tutorial on Automated Text Categorization," In Proceedings of the ASAI-99,1st Argentinian Symposium on Artificial Intelligence, 1999. pp. 7-35.

      [16] Yang, Y. (2001). A study of thresholding strategies for text categorization. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 137–145). ACM.

      [17] Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern Recognition and Machine Learning (Vol. 1). springer New York.

      [18] Hammo, B., Abu-Salem, H., Lytinen, S., and Evens, M. 2002. “QARAB: A Question Answering System to Support the Arabic Language”. Workshop on Computational Approaches to Semitic Languages. ACL 2002, Philadelphia, PA, July. pp. 55-65.

      [19] El-Kourdi, M., Bensaid, A., and Rachidi, T. "Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm," 20th International Conference on Computational Linguistics, 2004, Geneva.

      [20] Samir, A., W. Ata, and N. Darwish. "A New Technique for Automatic Text Categorization for Arabic Documents," 5th IBIMA Conference (The internet & information technology in modern organizations), 2005, Cairo, Egypt.

      [21] Geehan S. hassan, S.K. Mohammad and F.M. Alwan, 2015. Categorization of ‘Holy Quran-Tafseer’ using Knearest neighbor algorithm. Int. J. Comput. Appl., 129(12).

      [22] Manogaran G, Shakeel PM, Hassanein AS, Priyan MK, Gokulnath C. Machine-Learning Approach Based Gamma Distribution for Brain Abnormalities Detection and Data Sample Imbalance Analysis. IEEE Access. 2018 Nov 9.DOI 10.1109/ACCESS.2018.2878276

      [23] Joachims T. "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," In Proceedings of the European Conference on Machine Learning (ECML), 1998, pp.173-142, Berlin.

      [24] Lu, F., & Bai, Q. (2010). A refined weighted K-Nearest Neighbors algorithm for text categorization. Intelligent Systems and Knowledge Engineering ( …, 326–330. doi:10.1109/ISKE.2010.5680854

      [25] Powers, D. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, (December). Retrieved from WEKA. Data Mining Software in Java: http://www.cs.waikato.ac.nz/ml/weka

      [26] L. Haoyu, L. Jianxing, N. Arunkumar, A. F. Hussein, and M. M. Jaber, “An IoMT cloud-based real time sleep apnea detection scheme by using the SpO2 estimation supported by heart rate variability,” Futur. Gener. Comput. Syst., 2018.

      [27] P. M. Shakeel, S. Baskar, V. R. S. Dhulipala, and M. M. Jaber, “Cloud based framework for diagnosis of diabetes mellitus using K-means clustering,” Heal. Inf. Sci. Syst., vol. 6, no. 1, p. 16, 2018.

      [28] M. A. Mohammed et al., “Genetic case-based reasoning for improved mobile phone faults diagnosis,” Comput. Electr. Eng., 2018.

      [29] S. K. Abd, S. A. R. Al-Haddad, F. Hashim, A. B. H. J. Abdullah, and S. Yussof, “Energy-Aware Fault Tolerant Task offloading of Mobile Cloud Computing,” in Proceedings - 5th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, MobileCloud 2017, 2017.

      [30] Saleh Alsaleem (2010). Automated Arabic Text Categorization Using SVM and NB. In International Arab Journal of e-Technology, Vol. 2, No. 2, June 2011

      [31] Jones, K. S. (2004). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 60(5), 493–502.


 

View

Download

Article ID: 28415
 
DOI: 10.14419/ijet.v7i3.20.28415




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.