Index split decision tree and compositional deep neural network for text categorization

  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract

    Text categorization with machine learning algorithms generally reckons to possess horizontal set of classes. Several advanced machine learning algorithms have been designed in the past few decades. With the growing research work for text categorization, it has become important to categorize the research outcome and provide the learners with an effective machine learning method, a framework called, Hierarchical Decision Tree and Deep Neural Network (HDT-DNN).It investigates machine learning algorithms to create horizontal set of classes and it is used for classification of text. With this objective, a novel and efficient text categorization framework based on decision tree model is used in order to categorize text according to superior and subordinate level. The text to be categorized is presented in the form of a tree with parent text category being superior to all. The intermediate level represents the text that is both superior and subordinate. Then Deep Neural Network model is presented initiating compositional model, where the text has to be categorized, as a layered integration of primitives from the constructed decision tree model. The extra layers enable composition of features from lower layers, potentially modeling complex text with fewer units than a similarly carried out shallow network producing hierarchical classification. The significance of the impact of HDT-DNN framework is evaluated through empirical study. Extensive experiments are carried out and the performance of HDT-DNN framework is evaluated and compared with existing state-of-art methods using parameters such as precision, classification accuracy, classification time, with respect to varied number of features and document size.



  • Keywords

    Text Categorization; Machine Learning; Decision Tree; Deep Neural Network; Compositional Model and Hierarchical Classification.

  • References

      [1] HyunJi K, Byong SC & Moon YH, “Booster in high dimensional data classification”, IEEE Transactions on Knowledge and Data Engineering, Vol.28, No.1, (2016).

      [2] Bo T, Steven K & Haibo H, “Toward Optimal Feature Selection in Naive Bayes for Text Categorization”, IEEE Transactions on Knowledge and Data Engineering, Vol.28, No.9, (2016).

      [3] Reuters-21578 text categorization test collection, Distribution 1.0. Reuters, (1997).

      [4] Duy DAB, Guilherme DFA, Siddhartha J, “PDF text classification to leverage information extraction from publication reports”, Journal of Biomedical Informatics, Elsevier, (2016).

      [5] Mehdi HA & Setareh H, “feature selection using particle swarm optimization in text categorization”, JAISCR, Vol.5, No.4, (2015).

      [6] Chanawee C, Kitsuchart P & David RH, “A Comparative Study of Machine Learning Techniques for Automatic Product Categorisation”, Springer, (2017).

      [7] Joseph DP & Taghi MK, “Improving deep neural network design with new text data representations”, Journal of Big Data, Springer, (2017).

      [8] Adel HM, Omar AM & Tariq A, “Arabic Text Categorization using k-nearest neighbour, Decision Trees (C4.5) and Rocchio Classifier: A Comparative Study”, International Journal of Current Engineering and Technology, Vol.6, No.2, (2016).

      [9] Aleksandr S, Tatiana L, Dmitry G, Roman R & Ivan M, “Machine Learning Models of Text Categorization by Author Gender Using Topic-Independent Features”, Elsevier, (2016).

      [10] Gulin VV & Frolov AB, “On the Classification of Text Documents Taking into Account Their Structural Features”, Pattern Recognition And Image Processing, (2015).

      [11] Hao C, Wen J, Canbing L & Rui L, “A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm”, Hindawi Publishing Corporation Mathematical Problems in Engineering, (2013),

      [12] Guozhong F, Baiguo A, Fengqin Y, Han W & Libiao Z, “Relevance popularity: A term event model based feature selection scheme for text classification”, Plos One, (2017).

      [13] Joon YC, Tae KY, Jeong GS & Jiyong K, Terry Taewoong Um, Tyler Hyungtaek Rim, “Multi-categorical deep learning neuralnetwork to classify retinal images: A pilot study employing small database”, Plos One, (2017).

      [14] Doujie L, Zhongyan F & Wallace KST, “Domain learning naming game for color Categorization”, Plos One, (2017).

      [15] Hari S, “Effective feature selection technique for text Classification”, Int. J. Data Mining, Modelling and Management, Vol.7, No.3, (2015).

      [16] Ahmed HA & Esraa HAA, “Comparative Study of Five Text Classification Algorithms with their Improvements”, International Journal of Applied Engineering Research, Vol.12, No.14, (2017), pp.4309-4319.

      [17] Alper KU, “An improved global feature selection scheme for text classification”, Expert Systems with Applications, Elsevier, (2015).

      [18] Pradnya K & Manisha M, “A Survey on Feature Selection Techniques and Classification Algorithms for Efficient Text Classification”, International Journal of Science and Research (IJSR), Vol.5, No.5, (2016).

      [19] Wenbin Z, Yuntao Q, Minchao Y & Hangzhou, H, “A Grouped Structure-based Regularized Regression Model for Text Categorization”, Journal of Software, Vol. 7, No. 9, (2012).

      [20] Wenbin Z, Yuntao Q & Huijuan L, “Text categorization based on regularization extreme learning Machine”, Neural Comput & Applic., Vol.22, (2013), pp.447–456.




Article ID: 11245
DOI: 10.14419/ijet.v7i1.1.11245

Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.