FAST sequence mining based on sparse id-lists

Authors:
Eliana Salvemini;Fabio Fumarola;Donato Malerba;Jiawei Han
Affiliations:
Computer Science Dept., Univ. of Bari, Bari, Italy;Computer Science Dept., Univ. of Bari, Bari, Italy;Computer Science Dept., Univ. of Bari, Bari, Italy;Computer Science Dept., Univ. of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Year:
2011

Citing 8
Cited 1

FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth

ICDE '01 Proceedings of the 17th International Conference on Data Engineering
A Knowledge Discovery Framework for Learning Task Models from User Interactions in Intelligent Tutoring Systems

MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
HVSM: a new sequential pattern mining algorithm using bitmap representation

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Healthcare trajectory mining by combining multidimensional component and itemsets

NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential pattern mining is an important data mining task with applications in basket analysis, world wide web, medicine and telecommunication. This task is challenging because sequence databases are usually large with many and long sequences and the number of possible sequential patterns to mine can be exponential. We proposed a new sequential pattern mining algorithm called FAST which employs a representation of the dataset with indexed sparse id-lists to fast counting the support of sequential patterns. We also use a lexicographic tree to improve the efficiency of candidates generation. FAST mines the complete set of patterns by greatly reducing the effort for support counting and candidate sequences generation. Experimental results on artificial and real data show that our method outperforms existing methods in literature up to an order of magnitude or two for large datasets.