A Review of Different Text Categorization Techniques

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    In this paper, we focus on a major internet problem which is a huge amount of uncategorized text. We review existing techniques used for feature selection and categorization. After reviewing the existing literature, it was found that there exist some gaps in existing algorithms, one of which is a requirement of the labeled dataset for the training of the classifier.



  • Keywords

    Bayesian; KNN; PCA; SVM; TF-IDF

  • References

      [1] R. Jindal, R. Malhotra, A. Jain (2015), “Techniques for text classification: Literature review and current trends”, Webology, Volume 12, Number 2.

      [2] John Gantz and David Reinsel. 2012. THE DIGITAL UNIVERSE IN 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Technical Report 1. IDC, 5 Speen Street, Framingham, MA 01701 USA.

      [3] F. Sebastiani (2002), “Machine learning in automated text categorization”, ACM Computing Surveys (CSUR)

      [4] Y.X. Zhang, Artificial neural networks based on principal component analysis, Input selection for clinical pattern recognition analysis, Talanta73(2007)

      [5] T. Jolliffe, Principal Component Analysis, ACM Computing Surveys, Springer-Verlag, 1986. pp. 1–47

      [6] Mark A Friedl and Carla E Brodley. 1997. Decision tree classification of land cover from remotely sensed data. Remote sensing of environment 61, 3 (1997)

      [7] Eui-Hong Sam Han, George Karypis, and Vipin Kumar. 2001. Text categorization using weight adjusted k-nearest neighbor classification. Springer

      [8] Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, Vol. 752. Citeseer

      [9] Tao Dong and Wenqian Shang, 2011, An Improved Algorithm of Bayesian Text Categorization, Journal of Software, vol. 6, no. 9

      [10] Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995).

      [11] Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. Springer

      [12] M. Allahyari (2017), “A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques” , Arxiv.

      [13] Aggarwal, Charu C., and ChengXiang Zhai, eds. Mining text data. Springer Science & Business Media, 2012

      [14] KO, Y. J., Park, J., and Seo, J., “Improving text categorization using the importance of sentences”, International Journal Information, Processing and Management, vol. 40, no. 1, January 2004, pp. 65-79.

      [15] Songbo, T., Cheng, X., Ghanem, M. M., Wnag, B. and Xu, H., “A novel refinement approach for text categorization”, Proc. of 14th ACM International Conference on Information and Knowledge Management, 2005, pp.469-476.

      [16] Liang, C. Y., Guo, L., Xia, Z. H., Nie, F. G., Li, X. X., Su, L., and Yang, Z. Y. , “Dictionary-based text categorization of chemical web pages”, International Journal Information Processing and Management, vol. 42, no. 4, July 2006, pp.1072 – 1029.

      [17] Hao, P. Y., Chaing, J. H., and Tu, Y. K.., “Hierarchically SVM classification based on support vector clustering method and its application to document categorization”, International Journal Expert Systems with Applications, vol. 33, no. 3, October 2007, pp. 1-5.

      [18] CAO Jian-fang, WANG Hong-bin. 2010. Text categorization algorithms representations based on inductive learning, 2nd IEEE International Conference on Information Management and Engineering




Article ID: 15210
DOI: 10.14419/ijet.v7i3.8.15210

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.