FreeSpan: frequent pattern-projected sequential pattern mining
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Sequential PAttern mining using a bitmap representation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth
ICDE '01 Proceedings of the 17th International Conference on Data Engineering
MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
HVSM: a new sequential pattern mining algorithm using bitmap representation
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Healthcare trajectory mining by combining multidimensional component and itemsets
NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns
Hi-index | 0.00 |
Sequential pattern mining is an important data mining task with applications in basket analysis, world wide web, medicine and telecommunication. This task is challenging because sequence databases are usually large with many and long sequences and the number of possible sequential patterns to mine can be exponential. We proposed a new sequential pattern mining algorithm called FAST which employs a representation of the dataset with indexed sparse id-lists to fast counting the support of sequential patterns. We also use a lexicographic tree to improve the efficiency of candidates generation. FAST mines the complete set of patterns by greatly reducing the effort for support counting and candidate sequences generation. Experimental results on artificial and real data show that our method outperforms existing methods in literature up to an order of magnitude or two for large datasets.