An algorithm for string matching with a sequence of don't cares
Information Processing Letters
Verifying candidate matches in sparse and wildcard matching
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Mining periodic patterns with gap requirement from sequences
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Pattern Matching with Independent Wildcard Gaps
DASC '09 Proceedings of the 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing
Hi-index | 0.00 |
In biological sequence pattern mining, pattern matching is a core component to count the matches of each candidate pattern. We consider patterns with wildcard gaps. A wildcard gap matches any subsequence with a length between predefined lower and upper bounds. Since the number of candidate patterns might be huge, the efficiency of pattern matching is critical. We study two existing pattern matching algorithms named Pattern mAtching with Independent wildcard Gaps (PAIG) and Gap Constraint Search (GCS). GCS was designed to deal with patterns with identical gaps, and we propose to revise it for the case of independent gaps. PAIG can deal with global length constraints while GCS cannot. Both algorithms have the same space complexity. In the worst case, the time complexity of GCS is lower. However, in the best case, PAIG is more efficient. We discuss appropriate selection between PAIG and GCS through theoretical analysis and experimental results on a biological sequence.