Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Reliable Detection of Episodes in Event Sequences
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
Maximum entropy based significance of itemsets
Knowledge and Information Systems
Mining and ranking streams of news stories using cross-stream sequential patterns
Proceedings of the 18th ACM conference on Information and knowledge management
Mining actionable partial orders in collections of sequences
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Pattern-based solution risk model for strategic IT outsourcing
ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Hi-index | 0.00 |
We present a reliable universal method for ranking sequential patterns (itemset-sequences) with respect to significance in the problem of frequent sequential pattern mining. We approach the problem by first building a probabilistic reference model for the collection of itemset-sequences and then deriving an analytical formula for the frequency for sequential patterns in the reference model. We rank sequential patterns by computing the divergence between their actual frequencies and their frequencies in the reference model. We demonstrate the applicability of the presented method for discovering dependencies between streams of news stories in terms of significant sequential patterns, which is an important problem in multi-stream text mining and the topic detection and tracking research.