Mining sequential patterns by PrefixSpan algorithm with approximation

Authors:
Ankhbayar Yukhuu;Sansarbold Garamragchaa;Hwang Young Sup
Affiliations:
Department of Computer Science, Sun Moon University, Asan, Chugnam, South Korea;Department of Computer Science, Sun Moon University, Asan, Chugnam, South Korea;Department of Computer Science, Sun Moon University, Asan, Chugnam, South Korea
Venue:
ACS'08 Proceedings of the 8th conference on Applied computer scince
Year:
2008

Citing 8
Cited 0

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth

ICDE '01 Proceedings of the 17th International Conference on Data Engineering
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
Grid's confidential outsourcing of string matching

SEPADS'07 Proceedings of the 6th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Generalization of pattern-growth methods for sequential pattern mining with gap constraints

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
A minimum cost process in searching for a set of similar DNA sequences

TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We want to find sequential patterns in a long continues noisy DNA sequence. Sequential pattern mining, which discovers frequent subsequences as patterns in a sequence database, is an important data mining problem with broad applications, including the analysis of customer purchase patterns or Web access patterns and analysis of DNA sequences, and so on. We investigated sequential pattern mining algorithms for long continues DNA sequences. Most previously proposed mining algorithms follow the exact matching with a sequential pattern definition. They are not able to work in noisy environments and inaccurate data in practice. We investigated approximate matching method to deal with those cases. In this paper, we develop and apply Pattern-Growth PrefixSpan algorithm to find most repeated patterns, for example, motifs in DNA sequence. Our algorithm gains its efficiency by using pattern growth and approximation methodologies. The algorithm is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call approximated pattern. We developed algorithms to quickly find out all relative frequents by a pattern growth method and to determine approximated patterns from those frequents. Our experimental studies demonstrate that our algorithm is efficient in mining repeated approximate sequential patterns that would have been missed by existing methods.