Efficient mining of correlated sequential patterns based on null hypothesis

  • Authors:
  • Cindy Xide Lin;Ming Ji;Marina Danilevsky;Jiawei Han

  • Affiliations:
  • Twitter Inc, San Francisco, CA, USA & University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA

  • Venue:
  • Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent pattern mining has been a widely studied topic in the research area of data mining for more than a decade. However, pattern mining with real data sets is complicated - a huge number of co-occurrence patterns are usually generated, a majority of which are either redundant or uninformative. The true correlation relationships among data objects are buried deep among a large pile of useless information. To overcome this difficulty, mining correlations has been recognized as an important data mining task for its many advantages over mining frequent patterns. In this paper, we formally propose and define the task of mining frequent correlated sequential patterns from a sequential database. With this aim in mind, we re-examine various interestingness measures to select the appropriate one(s), which can disclose succinct relationships of sequential patterns. We then propose PSBSpan, an efficient mining algorithm based on the framework of the pattern-growth methodology which mines frequent correlated sequential patterns. Our experimental study on real datasets shows that our algorithm has outstanding performance in terms of both efficiency and effectiveness.