RCuA: rule classification use association data mining model for structure and unstructured data

  • Authors

    • Mohammed Hayel Refai Sur University College
    • Saleh Ali Alomari Jadara University
    • Tarik Khalil Sur University College
    • Hussam Saleh Abu Karaki Al Hussein Bin Talal University
    2019-04-07
    https://doi.org/10.14419/ijet.v7i4.28379
  • Text Categorization, Associative Classification, CBA, MCAR, RCuA, UCI, Reuters-21578.
  • Association and classification rule mining are important activities in the data mining domain. Incorporating the association rule discovery and classification within this domain leads to a method, called the associative classification method. Text Categorizations (TC) prevails form major problems through this domain including machine learning communities. This issue is not simple to be solved since available data has enormous dimensionality. There exist particular enormous amounts of online documents within a group of data in which each data is combined along with a particular class. Categorization refers to a structure of design from a categorized data, which categorizes past unrecognized documents as accurate as it could be. The paper proposes a novel text classification model by applying an Associative Classification (AC) model, namely, the Rule Classification use Association (RCuA), which produces an obvious text document. Additionally, the paper attempts at forming an expansion of available AC of current associative text classifiers, which cope with structure and unstructured English document assemblies. The produced model is tested through two experiments of structure and unstructured data. The first experiment is related to the UCI datasets, while the second is related to Reuters-21578 datasets. The experiment is based on utilizing various classification categorization learning algorithms (e.g. MCAR and CBA) in order to assess the efficiency of the proposed model in this paper. As a result, it is found to be proven from the findings that the new RCuA model improves the accuracy of the dataset in comparison with the MCAR and CBA algorithms where the number of existing rules is decreased. The RCuA makes an average accuracy of 83.945% compared to the CBA and MCAR algorithms resulting with an accuracy of 82.34% and 83.655%, respectively. In terms of unstructured dataset, the RCuA produces an average accuracy of 89.328% in comparison with the CBA and MCAR algorithms resulting with an accuracy of 77.34% and 83.64286%, respectively.

     

     
  • References

    1. [1] Fayyad, U., Gregory, P., and Padhraic, S. 1996. From data mining to knowledge discovery in databases. AI Magazine. 17 (3):37-54. DOI: 0738-4602-1996.

      [2] Wanjiang, H., Tianbo, L., Yi, S., Ye, L., Xiao, H., Weijian, L., and Chi, L. 2013. Research on the Problem Model of GUI based on Knowledge Discovery in Database. International Conference on Software Engineering and Computer Science. Advances in Intelligent Systems Research. https://doi.org/10.2991/icsecs-13.2013.2.

      [3] Tao, D., Weinan C., and Wenqian S., 2012. The Research of kNN Text Categorization Algorithm Based on Eager Learning. International Conference on Industrial Control and Electronics Engineering. 23-25 Aug. 2012. pp. 1120-1123. Xi'an, China. https://doi.org/10.1109/ICICEE.2012.297.

      [4] Andreas, C., Kypros, H., Argyro, S.,Kleanthis, C.,Gianna, L.,Costas, K., and Christos, N., 2012. Artificial Neural Networks to Investigate the Importance and the Sensitivity to Various Parameters Used for the Prediction of Chromosomal Abnormalities. IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer Berlin Heidelberg. Berlin, Heidelberg. PP:46—55. DOI.org/10.1007/978-3-642-33412-2_5.

      [5] Ian H. W., Eibe, F. 2005. Data Mining: Practical machine learning tools and techniques. 2nd Ed. Elsevier. ISBN: 0-12-088407-0.

      [6] Bharath., S., David, F., Engin, D., Hakan, F., and Murat, D., 2010. Short text classification in twitter to improve information filtering. 33rd international ACM SIGIR conference on Research and development in information retrieval. July 19 - 23, 2010. ACM New York. PP: 841-842. Geneva, Switzerland. ISBN: 978-1-4503-0153-4. https://doi.org/10.1145/1835449.1835643.

      [7] Gyorgy, J. S., Vipin, K., and Peter W. Li., 2011. A simple statistical model and association rule filtering for classification. 17th ACM SIGKDD international conference on Knowledge discovery and data mining. PP: 823-831.San Diego, California, USA — August 21 - 24, 2011. ISBN: 978-1-4503-0813-7. https://doi.org/10.1145/2020408.2020550.

      [8] Wen, Z., Taketoshi, Y., and Xijin, T., 2011. A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications. 38(3). March 2011, PP: 2758-2765.

      [9] Fadi, T., Omar, G., and Rashid, Z., 2012. Arabic Text Mining Using Rule Based Classification. Journal of Information & Knowledge Management. 11(1).

      [10] Svetlana, K., and Stan M., 2011. Email Classification with Co-Training. in Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, 2011, pp. 301-312.

      [11] DaÄŸ, H., Sayin, K. E., YenidoÄŸan, I., Albayrak, S., and Acar, C., 2012. Comparison of feature selection algorithms for medical data. 2012 International Symposium on Innovations in Intelligent Systems and Applications. 2-4 July 2012. Trabzon, Turkey. https://doi.org/10.1109/INISTA.2012.6247011.

      [12] James, A., Cooper, J., Jeffery, K., and Saake, G., 2009. Research Directions in Database Architectures for the Internet of Things: A Communication of the First International Workshop on Database Architectures for the Internet of Things (DAIT 2009)". British National Conference on Databases. Dataspace: The Final Frontier. Springer Berlin Heidelberg. PP: 225-233.

      [13] Yong, Z., Tieniu, T., and Yunhong, W., 2001. Font recognition based on global texture analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 23(10).Oct 2001. IEEE Computer Society. PP: 1192–1200. https://doi.org/10.1109/34.954608.

      [14] Xiaoguang, q., and Brian, D., 2009.Web page classification: Features and algorithms. ACM computing surveys (CSUR), 41(2), Article 12, February 2009. PP: 12-43. DOI 10.1145/1459352.1459357.

      [15] Ho, C. W., Robert, W., kam, F, and Kui, L., 2014. Interpreting TF-IDF Term Weights as Making Relevance Decisions. ACM Transactions on Information Systems, 26(3), Article 13. PP: 13-24.

      [16] Dimitris, M., and Beat, W., 1999. Extending naïve Bayes classifiers using long item sets. The fifth ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego, California, USA. August 15 - 18, 1999. PP. 165-174.

      [17] Lei, S., Rada, M., and Mingjun, T., 2010. Cross language text classification by model translation and semi-supervised learning. The 2010 Conference on Empirical Methods in Natural Language Processing. PP: 1057-1067. MIT, Massachusetts, USA, 9-11 October 2010. Association for Computational Linguistics.

      [18] Gentle, J. E., Härdle, W.K., Mori, Y., 2012. Handbook of computational statistics: concepts and methods: Springer, 2012. Springer Handbooks of Computational Statistics. ISBN 978-3-642-21551-3.

      [19] Erik, W., Jan, O. P., and Andreas, S. W., 1995. A Neural Network Approach to Topic Spotting. in Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, 1995, PP: 317-332.

      [20] Jing, D., Zhengkui, L., Weiguo, Y., and Mingyu, L., 2010. Scaling up the Accuracy of Bayesian Classifier Based on Frequent Itemsets by M-estimate. International Conference on Artificial Intelligence and Computational Intelligence. Springer Berlin Heidelberg. Berlin, Heidelberg. PP: 357—364. ISBN: 978-3-642-16530-6.

      [21] Fadi, T., Qazafi, M., Lee, M., and Hussein, A., 2010. A New Classification Based on Association Algorithm. Journal of Information & Knowledge Management. 9(1) PP: 55-64.

      [22] Gyorgy, J. S., Vipin, K., and Peter W. Li., 2011. A simple statistical model and association rule filtering for classification. 17th ACM SIGKDD international conference on Knowledge discovery and data mining. PP: 823-831.San Diego, California, USA — August 21 - 24, 2011. Jesse, R., Bernhard, P., Geoff, H., and Eibe, F., 2011. Classifier chains for multi-label classification. Machine learning. 85(3), PP: 333-359. Springer Berlin Heidelberg. ISSN: 1573-0565. https://doi.org/10.1007/s10994-011-5256-5.

      [23] Kui, Y., Wei, D., Dan, A. S., and Xindong, W. 2012. Mining emerging patterns by streaming feature selection. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '12). ACM, New York, NY, USA, 60-68. https://doi.org/10.1145/2339530.2339544.

      [24] Elena, B., Silvia, C., and Paolo, G. 2004. On support thresholds in associative classification. In Proceedings of the 2004 ACM symposium on Applied computing (SAC '04). ACM, New York, NY, USA, PP: 553-558. https://doi.org/10.1145/967900.968016.

      [25] Fadi, T., 2005. MCAR: multi-class classification based on association rule. The 3rd ACS/IEEE International Conference on Computer Systems and Applications. 6-6 Jan. 2005. Cairo, Egypt. PP: 33 IEEE. https://doi.org/10.1109/AICCSA.2005.1387030.

      [26] Zhonghua, T., and Qin L., 2007. A New Class Based Associative Classification Algorithm. IAENG International Journal of Applied Mathematics. 1998.–36: 2, IJAM. – . 136, vol. 141, 2007.

      [27] Yongwook, Y., and Gary, G. L., 2008. Text Categorization Based on Boosting Association Rules. 2008 IEEE International Conference on Semantic Computing. 4-7 Aug. 2008. Santa Clara, CA, USA. Publisher: IEEE. https://doi.org/10.1109/ICSC.2008.70.

      [28] Fadi, T., 2005. MCAR: multi-class classification based on association rule. The 3rd ACS/IEEE International Conference on Computer Systems and Applications. 6-6 Jan. 2005. Cairo, Egypt. PP: 33 IEEE. https://doi.org/10.1109/AICCSA.2005.1387030.

      [29] Antonie, M. L., and Osmar R. Z., 2004. Mining Positive and Negative Association Rules: An Approach for Confined Rules. European Conference on Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg. Berlin, Heidelberg. PP: 27—38. ISBN: 978-3-540-30116-5.

      [30] Wenmin, L., Jiawei, H., and Jian, P., CMAR: accurate and efficient classification based on multiple class-association rules. Proceedings 2001 IEEE International Conference on Data Mining, San Jose, CA, USA, 2001, pp. 369-376. https://doi.org/10.1109/ICDM.2001.989541.

      [31] Xiaoxin, Y., and Jiawei, H., 2003. CPAR: Classification based on predictive association rules. SIAM International Conference on Data Mining. PP: 331.

      [32] Gourab, K., Monirul, I., Sirajum, M., and Faizul, B., 2008. ACN: An Associative Classifier with Negative Rules. 11th IEEE International Conference on Computational Science and Engineering. 16-18 July 2008. Sao Paulo, Brazil. https://doi.org/10.1109/CSE.2008.48.

      [33] Fadi T., Peter I. C., and Yonghong, P., 2004. MMAC: a new multi-class, multi-label associative classification approach, Fourth IEEE International Conference on Data Mining (ICDM'04), Brighton, UK, 2004, pp. 217-224. https://doi.org/10.1109/ICDM.2004.10117.

      [34] Fabrizio, S., 2003. Machine Learning in Automated Text Categorization. ACM Computer Survey (CSUR), 34(1), pp. 1–47.

      [35] Guozhu, D., Xiuzhen, Z., Limsoon, W., and Jinyan, L., 1999. CAEP: Classification by aggregating emerging patterns. DS’99, LNAI 1721, pp. 30–42, 1999. Springer-Verlag Berlin Heidelberg. Berlin, Heidelberg. ISBN: 978-3-540-46846-2.

      [36] Merz, C., and Murphy, P. 1996. UCI repository of machine learning databases. FTP from ics. uci. edu in the directory pub/machine-learning-databases.

      [37] Mohamed .R, and Yuhanis, Y. 2014. Partial rule match for filtering rules in associative classification. Journal of Computer Science. 10(4). PP.570-577. doi: 10. 3844 /jcssp. 2014. 570 .577.

      [38] Yuhanis, Y., and Mohamed .R. 2012. MMCAR: Modified multi-class classification based on association rule. IEEE International Conference on Information Retrieval & Knowledge Management. 13-15 March 2012. Kuala Lumpur, Malaysia. Bing, L., Wynne, H., Yiming, M., 1998. Integrating classification and association rule mining. Knowledge discovery and data mining, American Association for Artificial Intelligence. pp. 80–86.

      [39] Rakesh, A., and Ramakrishnan, S. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94), Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, PP: 487-499. ISBN: 1-55860-153-8.

      [40] Lewis, D., 2004. Reuters-21578. Available: http://www.daviddlewis.com/ resources/ testcollections/reuters21578/.

      [41] Man, L., Chew, L. T., Jian, S., and Yue, L., 2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization, in IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4). PP: 721-735, April 2009.

      [42] Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F. 2005. Text Mining. Predictive Methods for Analyzing Unstructured Information. Springer-Verlag New York Inc. ISBN 978-0-387-34555-0. https://doi.org/10.1007/978-0-387-34555-0.

      [43] Siti, S. K., Yuhanis, Y., Husniza, H., Mohammad, H. R. 2016. Text Classification Using Modified Multi Class Association Rule. JURNAL TEKNOLOGI 78.8-2 (2016). PP: 163-170.

      [44] Zhun, Z., Bingru, Y., and Wei, H. 2010. Association classification algorithm based on structure sequence in protein secondary structure prediction. Expert Systems with Applications. 37(9) (September 2010), PP: 6381-6389.

      [45] Fadi, T., and Suhel, H, 2013. MR-ARM: A Map-Reduce Association Rule Mining Framework. Parallel Processing Letters. 23(3), 1350012(2013).

      [46] Robert, E., Schapire, Y. S., and Amit, S. 1998. Boosting and Rocchio applied to text filtering. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '98). ACM, New York, NY, USA, 215-223. https://doi.org/10.1145/290941.290996.

  • Downloads

  • How to Cite

    Hayel Refai, M., Ali Alomari, S., Khalil, T., & Saleh Abu Karaki, H. (2019). RCuA: rule classification use association data mining model for structure and unstructured data. International Journal of Engineering & Technology, 7(4), 5659-5665. https://doi.org/10.14419/ijet.v7i4.28379