Fast Discovery of Sequential Patterns by Memory Indexing

Authors:
Ming-Yen Lin;Suh-Yin Lee
Affiliations:
-;-
Venue:
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Year:
2002

Citing 10
Cited 9

Incremental and interactive sequence mining

Proceedings of the eighth international conference on Information and knowledge management
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining

Efficient mining of sequential patterns with time constraints by delimited pattern growth

Knowledge and Information Systems
Post sequential patterns mining: a new method for discovering structural patterns

Intelligent information processing II
ARMADA - An algorithm for discovering richer relative temporal association rules from interval-based data

Data & Knowledge Engineering
Efficient mining of understandable patterns from multivariate interval time series

Data Mining and Knowledge Discovery
Unsupervised pattern mining from symbolic temporal data

ACM SIGKDD Explorations Newsletter - Special issue on data mining for health informatics
Incremental mining of sequential patterns using prefix tree

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Analysis on repeat-buying patterns

Knowledge-Based Systems
Mining temporal patterns from sequence database of interval-based events

FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Discovering richer temporal association rules from interval-based data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining sequential patterns is an important issue for the complexity of temporal pattern discovering from sequences. Current mining approaches either require many times of database scanning or generate several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns will become possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once for reading data sequences into memory. The find-then- index technique recursively finds the items which constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. Through effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns getting longer. Moreover, the maximum size of total memory required, which is independent of minimum support threshold in MEMISP, can be estimated. The experiments indicates that MEMISP outperforms both GSP and PrefixSpan algorithms. MEMISP also has good linear scalability even with very low minimum support. When the database is too large to fit in memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Therefore, MEMISP may efficiently mine databases of any size, for any minimum support values.