Index split decision tree and compositional deep neural network for text categorization

  • Authors

    • N Ravikumar
    • Dr P. Tamil Selvan
    2017-12-21
    https://doi.org/10.14419/ijet.v7i1.1.9953
  • Text Categorization, Machine Learning, Decision Tree, Deep Neural Network, Compositional Model and Hierarchical Classification
  • Abstract

    Text categorization with machine learning algorithms generally reckons to possess horizontal set of classes. Several advanced machine learning algorithms have been designed in the past few decades. With the growing research work for text categorization, it has become important to categorize the research outcome and provide the learners with an effective machine learning method, a framework called, Hierarchical Decision Tree and Deep Neural Network (HDT-DNN).It investigates machine learning algorithms to create horizontal set of classes and it is used for classification of text. With this objective, a novel and efficient text categorization framework based on decision tree model is used in order to categorize text according to superior and subordinate level. The text to be categorized is presented in the form of a tree with parent text category being superior to all. The intermediate level represents the text that is both superior and subordinate. Then Deep Neural Network model is presented initiating compositional model, where the text has to be categorized, as a layered integration of primitives from the constructed decision tree model. The extra layers enable composition of features from lower layers, potentially modeling complex text with fewer units than a similarly carried out shallow network producing hierarchical classification. The significance of the impact of HDT-DNN framework is evaluated through empirical study. Extensive experiments are carried out and the performance of HDT-DNN framework is evaluated and compared with existing state-of-art methods using parameters such as precision, classification accuracy, classification time, with respect to varied number of features and document size.

  • References

    1. [1] HyunJi K, Byong SC & Moon YH, “Booster in high dimensional data classificationâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.28, No.1, (2016).

      [2] Bo T, Steven K &Haibo H, “Toward Optimal Feature Selection in Naive Bayes for Text Categorizationâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.28, No.9, (2016).

      [3] Reuters-21578 text categorization test collection, Distribution 1.0. Reuters, (1997).

      [4] Duy DAB, Guilherme DFA, Siddhartha J, “PDF text classification to leverage information extraction from publication reportsâ€, Journal of Biomedical Informatics, Elsevier, (2016).

      [5] Mehdi HA & Setareh H, “feature selection using particle swarm optimization in text categorizationâ€, JAISCR, Vol.5, No.4, (2015).

      [6] Chanawee C, Kitsuchart P & David RH, “A Comparative Study of Machine Learning Techniques for Automatic Product Categorisationâ€, Springer, (2017).

      [7] Joseph DP &Taghi MK, “Improving deep neural network design with new text data representationsâ€, Journal of Big Data, Springer, (2017).

      [8] Adel HM, Omar AM & Tariq A, “Arabic Text Categorization using k-nearest neighbour, Decision Trees (C4.5) and Rocchio Classifier: A Comparative Studyâ€, International Journal of Current Engineering and Technology, Vol.6, No.2, (2016).

      [9] Aleksandr S, Tatiana L, Dmitry G, Roman R & Ivan M, “Machine Learning Models of Text Categorization by Author Gender Using Topic-Independent Featuresâ€, Elsevier, (2016).

      [10] Gulin VV &Frolov AB, “On the Classification of Text Documents Taking into Account Their Structural Featuresâ€, Pattern Recognition And Image Processing, (2015).

      [11] Hao C, Wen J, Canbing L & Rui L, “A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithmâ€, Hindawi Publishing Corporation Mathematical Problems in Engineering, (2013),

      [12] Guozhong F, Baiguo A, Fengqin Y, Han W &Libiao Z, “Relevance popularity: A term event model based feature selection scheme for text classificationâ€, Plos One, (2017).

      [13] Joon YC, Tae KY, Jeong GS &Jiyong K, Terry Taewoong Um, Tyler Hyungtaek Rim, “Multi-categorical deep learning neuralnetwork to classify retinal images: A pilot study employing small databaseâ€, Plos One, (2017).

      [14] Doujie L, Zhongyan F & Wallace KST, “Domain learning naming game for color Categorizationâ€, Plos One, (2017).

      [15] Hari S, “Effective feature selection technique for text Classificationâ€, Int. J. Data Mining, Modelling and Management, Vol.7, No.3, (2015).

      [16] Ahmed HA &Esraa HAA, “Comparative Study of Five Text Classification Algorithms with their Improvementsâ€, International Journal of Applied Engineering Research, Vol.12, No.14, (2017), pp.4309-4319.

      [17] Alper KU, “An improved global feature selection scheme for text classificationâ€, Expert Systems with Applications, Elsevier, (2015).

      [18] Pradnya K & Manisha M, “A Survey on Feature Selection Techniques and Classification Algorithms for Efficient Text Classificationâ€, International Journal of Science and Research (IJSR), Vol.5, No.5, (2016).

      [19] Wenbin Z, Yuntao Q, Minchao Y & Hangzhou, H, “A Grouped Structure-based Regularized Regression Model for Text Categorizationâ€, Journal of Software, Vol. 7, No. 9, (2012).

      [20] Wenbin Z, Yuntao Q &Huijuan L, “Text categorization based on regularization extreme learning Machineâ€, Neural Comput&Applic., Vol.22, (2013), pp.447–456.https://doi.org/10.1007/s00521-011-0808-y.

  • Downloads

  • How to Cite

    Ravikumar, N., & P. Tamil Selvan, D. (2017). Index split decision tree and compositional deep neural network for text categorization. International Journal of Engineering & Technology, 7(1.1), 449-455. https://doi.org/10.14419/ijet.v7i1.1.9953

    Received date: 2018-03-08

    Accepted date: 2018-03-08

    Published date: 2017-12-21