A heuristic to predict the optimal pattern-growth direction for the pattern growth-based sequential pattern mining approach

 
 
 
  • Abstract
  • Keywords
  • References
  • PDF
  • Abstract


    Sequential pattern mining is an efficient technique for discovering recurring structures or patterns from very large datasets, with a very large field of applications. It aims at extracting a set of attributes, shared across time among a large number of objects in a given database. Previous studies have developed two major classes of sequential pattern mining methods, namely, the candidate generation-and-test approach based on either vertical or horizontal data formats represented respectively by GSP and SPADE, and the pattern-growth approach represented by FreeSpan, PrefixSpan and their further extensions. The performances of these algorithms depend on how patterns grow. Because of this, we introduce a heuristic to predict the optimal pattern-growth direction, i.e. the pattern-growth direction leading to the best performance in terms of runtime and memory usage. Then, we perform a number of experimentations on both real-life and synthetic datasets to test the heuristic. The performance analysis of these experimentations show that the heuristic prediction is reliable in general.


  • Keywords


    Sequence Mining; Sequential Pattern; Frequent Pattern; Pattern-Growth Direction; Heuristic; Prefixspan; Suffixspan.

  • References


      [1] R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. Pages 207-216, 1993.

      [2] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, pages 487-499, 1994.

      [3] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, March 6-10, 1995, Taipei, Taiwan, pages 3-14, 1995.

      [4] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, pages 429-435, 2002.

      [5] T. Dam, K. Li, P. Fournier-Viger, and Q. Duong. An efficient algorithm for mining top-rank-k frequent patterns. Appl. Intell., 45(1):96-111, 2016.

      [6] M. El-Sayed, C. Ruiz, and E. A. Rundensteiner. Fs-miner: efficient and incremental mining of frequent sequence patterns in web logs. In Sixth ACM CIKM International Workshop on Web Information and Data Management (WIDM 2004), Washington, DC, USA, November 12-13, 2004, pages 128-135, 2004.

      [7] P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, C. Wu, and V. S. Tseng. SPMF: a java open-source pattern mining library. Journal of Machine Learning Research, 15(1):3389-3393, 2014.

      [8] M. N. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: sequential pattern mining with regular expression constraints. In VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK, pages 223-234, 1999.

      [9] K. Gouda, M. Hassaan, and M. J. Zaki. Prism: A primal-encoding approach for frequent sequence mining. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), October 28- 31, 2007, Omaha, Nebraska, USA, pages 487-492, 2007.

      [10] K. Gouda, M. Hassaan, and M. J. Zaki. Prism: An effective approach for frequent sequence mining via prime-block encoding. J. Comput. Syst. Sci., 76(1):88-102, 2010.

      [11] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA, USA, August 20-23, 2000, pages 355-359, 2000.

      [12] C. Hsieh, D. Yang, and J. Wu. An efficient sequential pattern mining algorithm based on the 2 sequence matrix. In Workshops Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 15-19, 2008, Pisa, Italy, pages 583-591, 2008.

      [13] N. R. Mabroukeh and C. I. Ezeife. A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv., 43(1):3, 2010.

      [14] F. Masseglia, F. Cathala, and P. Poncelet. The PSP approach for mining sequential patterns. In Principles of Data Mining and Knowledge Discovery, Second European Symposium, PKDD ’98, Nantes, France, September 23-26, 1998, Proceedings, pages 176-184, 1998.

      [15] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany, pages 215-224, 2001.

      [16] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans. Knowl. Data Eng., 16(11):1424-1440, 2004.

      [17] J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining access patterns efficiently from web logs. In Knowledge Discovery and Data Mining, Current Issues and New Applications, 4th Pacific-Asia Conference, PADKK 2000, Kyoto, Japan, April 18-20, 2000, Proceedings, pages 396-407, 2000.

      [18] L. Savary and K. Zeitouni. Indexed bit map (IBM) for mining frequent sequences. In Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings, pages 659-666, 2005.

      [19] M. Seno and G. Karypis. Slpminer: An algorithm for finding frequent sequential patterns using length-decreasing support constraint. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9-12 December 2002, Maebashi City, Japan, pages 418-425, 2002.

      [20] Z. Yang and M. Kitsuregawa. LAPIN-SPAM: an improved algorithm for mining sequential pattern. In Proceedings of the 21st International Conference on Data Engineering Workshops, ICDE 2005, 5-8 April 2005, Tokyo, Japan, page 1222, 2005.

      [21] Z. Yang, Y. Wang, and M. Kitsuregawa. LAPIN: effective sequential pattern mining algorithms by last position induction for dense databases. In Advances in Databases: Concepts, Systems and Applications, 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, Bangkok, Thailand, April 9-12, 2007, Proceedings, pages 1020-1023, 2007.

      [22] M. J. Zaki. Sequence mining in categorical domains: Incorporating constraints. In Proceedings of the 2000 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 6-11, 2000, pages 422-429, 2000.

      [23] M. J. Zaki. SPADE: an efficient algorithm for mining frequent sequences. Machine Learning,

 

View

Download

Article ID: 7011
 
DOI: 10.14419/jacst.v6i2.7011




Copyright © 2012-2015 Science Publishing Corporation Inc. All rights reserved.