A Comparative Study of Pattern Matching Algorithms on Sequences

  • Authors:
  • Fan Min;Xindong Wu

  • Affiliations:
  • School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China 610054 and Department of Computer Science, University of Vermont, Burlington, U ...;Department of Computer Science, University of Vermont, Burlington, USA 05405 and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China 230009

  • Venue:
  • RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In biological sequence pattern mining, pattern matching is a core component to count the matches of each candidate pattern. We consider patterns with wildcard gaps. A wildcard gap matches any subsequence with a length between predefined lower and upper bounds. Since the number of candidate patterns might be huge, the efficiency of pattern matching is critical. We study two existing pattern matching algorithms named Pattern mAtching with Independent wildcard Gaps (PAIG) and Gap Constraint Search (GCS). GCS was designed to deal with patterns with identical gaps, and we propose to revise it for the case of independent gaps. PAIG can deal with global length constraints while GCS cannot. Both algorithms have the same space complexity. In the worst case, the time complexity of GCS is lower. However, in the best case, PAIG is more efficient. We discuss appropriate selection between PAIG and GCS through theoretical analysis and experimental results on a biological sequence.