Efficient mining of correlated sequential patterns based on null hypothesis

Authors:
Cindy Xide Lin;Ming Ji;Marina Danilevsky;Jiawei Han
Affiliations:
Twitter Inc, San Francisco, CA, USA & University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning
Year:
2012

Citing 26
Cited 1

Word association norms, mutual information, and lexicography

Computational Linguistics
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Foundations of statistical natural language processing

Foundations of statistical natural language processing
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
CoMine: Efficient Mining of Correlated Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Scalable mining of large disk-based graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Mining quantitative correlated patterns using an information-theoretic approach

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Computing exact P-values for DNA motifs

Bioinformatics
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
ArnetMiner: extraction and mining of academic social networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
ORIGAMI: Mining Representative Orthogonal Graph Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On effective presentation of graph patterns: a structural representative approach

Proceedings of the 17th ACM conference on Information and knowledge management
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
RING: An Integrated Method for Frequent Representative Subgraph Mining

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Efficient Discovery of Frequent Correlated Subgraph Pairs

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
PET: a statistical model for popular events tracking in social communities

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Re-examination of interestingness measures in pattern mining: a unified framework

Data Mining and Knowledge Discovery
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Efficient mining of top correlated patterns based on null-invariant measures

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
The Joint Inference of Topic Diffusion and Evolution in Social Communities

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining

The 2012 international workshop on web-scale knowledge representation, retrieval, and reasoning

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent pattern mining has been a widely studied topic in the research area of data mining for more than a decade. However, pattern mining with real data sets is complicated - a huge number of co-occurrence patterns are usually generated, a majority of which are either redundant or uninformative. The true correlation relationships among data objects are buried deep among a large pile of useless information. To overcome this difficulty, mining correlations has been recognized as an important data mining task for its many advantages over mining frequent patterns. In this paper, we formally propose and define the task of mining frequent correlated sequential patterns from a sequential database. With this aim in mind, we re-examine various interestingness measures to select the appropriate one(s), which can disclose succinct relationships of sequential patterns. We then propose PSBSpan, an efficient mining algorithm based on the framework of the pattern-growth methodology which mines frequent correlated sequential patterns. Our experimental study on real datasets shows that our algorithm has outstanding performance in terms of both efficiency and effectiveness.