Index split decision tree and compositional deep neural network for text categorization
Keywords:Text Categorization, Machine Learning, Decision Tree, Deep Neural Network, Compositional Model and Hierarchical Classification
Text categorization with machine learning algorithms generally reckons to possess horizontal set of classes. Several advanced machine learning algorithms have been designed in the past few decades. With the growing research work for text categorization, it has become important to categorize the research outcome and provide the learners with an effective machine learning method, a framework called, Hierarchical Decision Tree and Deep Neural Network (HDT-DNN).It investigates machine learning algorithms to create horizontal set of classes and it is used for classification of text. With this objective, a novel and efficient text categorization framework based on decision tree model is used in order to categorize text according to superior and subordinate level. The text to be categorized is presented in the form of a tree with parent text category being superior to all. The intermediate level represents the text that is both superior and subordinate. Then Deep Neural Network model is presented initiating compositional model, where the text has to be categorized, as a layered integration of primitives from the constructed decision tree model. The extra layers enable composition of features from lower layers, potentially modeling complex text with fewer units than a similarly carried out shallow network producing hierarchical classification. The significance of the impact of HDT-DNN framework is evaluated through empirical study. Extensive experiments are carried out and the performance of HDT-DNN framework is evaluated and compared with existing state-of-art methods using parameters such as precision, classification accuracy, classification time, with respect to varied number of features and document size.
 HyunJi K, Byong SC & Moon YH, â€œBooster in high dimensional data classificationâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.28, No.1, (2016).
 Bo T, Steven K &Haibo H, â€œToward Optimal Feature Selection in Naive Bayes for Text Categorizationâ€, IEEE Transactions on Knowledge and Data Engineering, Vol.28, No.9, (2016).
 Reuters-21578 text categorization test collection, Distribution 1.0. Reuters, (1997).
 Duy DAB, Guilherme DFA, Siddhartha J, â€œPDF text classification to leverage information extraction from publication reportsâ€, Journal of Biomedical Informatics, Elsevier, (2016).
 Mehdi HA & Setareh H, â€œfeature selection using particle swarm optimization in text categorizationâ€, JAISCR, Vol.5, No.4, (2015).
 Chanawee C, Kitsuchart P & David RH, â€œA Comparative Study of Machine Learning Techniques for Automatic Product Categorisationâ€, Springer, (2017).
 Joseph DP &Taghi MK, â€œImproving deep neural network design with new text data representationsâ€, Journal of Big Data, Springer, (2017).
 Adel HM, Omar AM & Tariq A, â€œArabic Text Categorization using k-nearest neighbour, Decision Trees (C4.5) and Rocchio Classifier: A Comparative Studyâ€, International Journal of Current Engineering and Technology, Vol.6, No.2, (2016).
 Aleksandr S, Tatiana L, Dmitry G, Roman R & Ivan M, â€œMachine Learning Models of Text Categorization by Author Gender Using Topic-Independent Featuresâ€, Elsevier, (2016).
 Gulin VV &Frolov AB, â€œOn the Classification of Text Documents Taking into Account Their Structural Featuresâ€, Pattern Recognition And Image Processing, (2015).
 Hao C, Wen J, Canbing L & Rui L, â€œA Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithmâ€, Hindawi Publishing Corporation Mathematical Problems in Engineering, (2013),
 Guozhong F, Baiguo A, Fengqin Y, Han W &Libiao Z, â€œRelevance popularity: A term event model based feature selection scheme for text classificationâ€, Plos One, (2017).
 Joon YC, Tae KY, Jeong GS &Jiyong K, Terry Taewoong Um, Tyler Hyungtaek Rim, â€œMulti-categorical deep learning neuralnetwork to classify retinal images: A pilot study employing small databaseâ€, Plos One, (2017).
 Doujie L, Zhongyan F & Wallace KST, â€œDomain learning naming game for color Categorizationâ€, Plos One, (2017).
 Hari S, â€œEffective feature selection technique for text Classificationâ€, Int. J. Data Mining, Modelling and Management, Vol.7, No.3, (2015).
 Ahmed HA &Esraa HAA, â€œComparative Study of Five Text Classification Algorithms with their Improvementsâ€, International Journal of Applied Engineering Research, Vol.12, No.14, (2017), pp.4309-4319.
 Alper KU, â€œAn improved global feature selection scheme for text classificationâ€, Expert Systems with Applications, Elsevier, (2015).
 Pradnya K & Manisha M, â€œA Survey on Feature Selection Techniques and Classification Algorithms for Efficient Text Classificationâ€, International Journal of Science and Research (IJSR), Vol.5, No.5, (2016).
 Wenbin Z, Yuntao Q, Minchao Y & Hangzhou, H, â€œA Grouped Structure-based Regularized Regression Model for Text Categorizationâ€, Journal of Software, Vol. 7, No. 9, (2012).
 Wenbin Z, Yuntao Q &Huijuan L, â€œText categorization based on regularization extreme learning Machineâ€, Neural Comput&Applic., Vol.22, (2013), pp.447â€“456.https://doi.org/10.1007/s00521-011-0808-y.