Unified Framework of Dimensionality Reduction and Text Categorisation

  • Authors

    • K. M.M Rajashekharaiah
    • Sunil S Chikkalli
    • Prateek K Kumbar
    • Dr. P. Suryanarayana Babu
    https://doi.org/10.14419/ijet.v7i3.29.21397
  • Classification accuracy, Classifier, Dimension Reduction, Framework, Supervised learning, Support Vector Machine(SVM), Text Classification/Categorisation (TC)
  • Text classification (categorization) is a supervised learning task that assigns text documents to pre-defined classes of documents. It is used to organize and manage the collection of text documents available in digital form. To accomplish the task, support vector machine (SVM) is regarded as the suitable classifier for any kind of applications. Though SVM’s computational complexity is independent of number of dimensions, still high dimensionality poses the problem of ‘curse of dimensionality’ that can be solved effectively by the process of Dimension Reduction (DR). This work contemplates on developing a framework for dimensionality reduction and text classification. A comparative analysis of the classification accuracies using two approaches viz., text classification with dimensionality reduction and text classification without dimensionality reduction completes the scope of the paper. It also evaluates the efficiency of various dimensionality reduction techniques to include one of the most coherent methods in the framework.

     

  • References

    1. [1] Richard Ernest Bellman, “Adaptive control processes: a guided tourâ€, Princeton University Press, 1967.

      [2] Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorizationâ€, International Conference on Machine Learning, 1997.

      [3] Hyunsoo Kim, Peg Howland and Haesun Park, “Dimension Reduction in Text Classification with Support Vector Machinesâ€.

      [4] E. Bingham and H. Mannila, “Random projection in dimensionality reduction: applications to image and text dataâ€. In ACM Special Interest Group on Management of Data. ACM Press, 2001.

      [5] Underhill, D.G., McDowell, L., Marchette, D.J., & Solka, J.L. (2007). Enhancing Text Analysis via Dimensionality Reduction. 2007 IEEE International Conference on Information Reuse and Integration, 348-353.

      [6] Aas, K. and Eikvil, L. 1999. Text Categorization: A survey. Tech. rep. 941. Norwegian Computing Center, Oslo, Norway.

      [7] Yang, Y., Slattery, S., and Ghani, R., 2002, “A study of approaches to hypertext categorizationâ€, J. Intell. Inform. Syst. 18, 2/3 (March-May), 219–241.

      [8] Fabrizio Sebastiani, “Machine learning in automated text categorizationâ€, ACM Computing Surveys, 34(1):1-47, 2002.

      [9] F. Wickelmaier. An introduction to MDS. Technical report,Aalborg University (Denmark), May 2003.

      [10] Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich, "Vector space classification: Introduction to Information Retrievalâ€, Cambridge University Press, 2008

      [11] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, “Indexing by latent semantic analysisâ€. Journal of the Society for Information Science, 41:391-407, 1990.

      [12] Chelsea Boling and Kumar Das, “Reducing Dimensionality of Text Documents using Latent Semantic Analysisâ€

      [13] P. Howland, M. Jeon, and H. Park, “Structure Preserving Dimension Reduction for Clustered Text Data based on the Generalized Singular Value Decompositionâ€, SIAM Journal of Matrix Analysis and Applications, 25(1):165–179, 2003.

      [14] Yogesh Jain, Amit kumar Nandanwar, A Theoretical Study of Text Document Clustering, “International Journal of Computer Science and Information Technologiesâ€, Vol. 5 (2), 2014, 2246-2251

      [15] Pratiksha Y Pawar and S H Gawande, “A Comparitive Study on Different Types of Approaches to Text Categorizationâ€, International Journal of Machine Learning and Computing, Vol 2, No 4, August 2012

  • Downloads

  • How to Cite

    M.M Rajashekharaiah, K., S Chikkalli, S., K Kumbar, P., & P. Suryanarayana Babu, D. (2018). Unified Framework of Dimensionality Reduction and Text Categorisation. International Journal of Engineering & Technology, 7(3.29), 648-654. https://doi.org/10.14419/ijet.v7i3.29.21397