A Review of Different Text Categorization Techniques

  • Authors

    • Anubhav Aggarwal
    • Jasmeet Singh
    • Dr Kapil Gupta
    2018-07-07
    https://doi.org/10.14419/ijet.v7i3.8.15210
  • Bayesian, KNN, PCA, SVM, TF-IDF
  • In this paper, we focus on a major internet problem which is a huge amount of uncategorized text. We review existing techniques used for feature selection and categorization. After reviewing the existing literature, it was found that there exist some gaps in existing algorithms, one of which is a requirement of the labeled dataset for the training of the classifier.

     

     

  • References

    1. [1] R. Jindal, R. Malhotra, A. Jain (2015), “Techniques for text classification: Literature review and current trendsâ€, Webology, Volume 12, Number 2.

      [2] John Gantz and David Reinsel. 2012. THE DIGITAL UNIVERSE IN 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. Technical Report 1. IDC, 5 Speen Street, Framingham, MA 01701 USA.

      [3] F. Sebastiani (2002), “Machine learning in automated text categorizationâ€, ACM Computing Surveys (CSUR)

      [4] Y.X. Zhang, Artificial neural networks based on principal component analysis, Input selection for clinical pattern recognition analysis, Talanta73(2007)

      [5] T. Jolliffe, Principal Component Analysis, ACM Computing Surveys, Springer-Verlag, 1986. pp. 1–47

      [6] Mark A Friedl and Carla E Brodley. 1997. Decision tree classification of land cover from remotely sensed data. Remote sensing of environment 61, 3 (1997)

      [7] Eui-Hong Sam Han, George Karypis, and Vipin Kumar. 2001. Text categorization using weight adjusted k-nearest neighbor classification. Springer

      [8] Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, Vol. 752. Citeseer

      [9] Tao Dong and Wenqian Shang, 2011, An Improved Algorithm of Bayesian Text Categorization, Journal of Software, vol. 6, no. 9

      [10] Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995).

      [11] Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. Springer

      [12] M. Allahyari (2017), “A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques†, Arxiv.

      [13] Aggarwal, Charu C., and ChengXiang Zhai, eds. Mining text data. Springer Science & Business Media, 2012

      [14] KO, Y. J., Park, J., and Seo, J., “Improving text categorization using the importance of sentencesâ€, International Journal Information, Processing and Management, vol. 40, no. 1, January 2004, pp. 65-79.

      [15] Songbo, T., Cheng, X., Ghanem, M. M., Wnag, B. and Xu, H., “A novel refinement approach for text categorizationâ€, Proc. of 14th ACM International Conference on Information and Knowledge Management, 2005, pp.469-476.

      [16] Liang, C. Y., Guo, L., Xia, Z. H., Nie, F. G., Li, X. X., Su, L., and Yang, Z. Y. , “Dictionary-based text categorization of chemical web pagesâ€, International Journal Information Processing and Management, vol. 42, no. 4, July 2006, pp.1072 – 1029.

      [17] Hao, P. Y., Chaing, J. H., and Tu, Y. K.., “Hierarchically SVM classification based on support vector clustering method and its application to document categorizationâ€, International Journal Expert Systems with Applications, vol. 33, no. 3, October 2007, pp. 1-5.

      [18] CAO Jian-fang, WANG Hong-bin. 2010. Text categorization algorithms representations based on inductive learning, 2nd IEEE International Conference on Information Management and Engineering

  • Downloads

  • How to Cite

    Aggarwal, A., Singh, J., & Kapil Gupta, D. (2018). A Review of Different Text Categorization Techniques. International Journal of Engineering & Technology, 7(3.8), 11-15. https://doi.org/10.14419/ijet.v7i3.8.15210