Pattern matching with wildcards based on multiple suffix trees

  • Authors:
  • Yingling Liu;Xindong Wu;Xuegang Hu;Jun Gao;Chi Wang

  • Affiliations:
  • School of Computer Science & Information Engineering, Hefei University of Technology, 230009, China;School of Computer Science & Information Engineering, Hefei University of Technology, 230009, China;School of Computer Science & Information Engineering, Hefei University of Technology, 230009, China;School of Computer Science & Information Engineering, Hefei University of Technology, 230009, China;School of Computer Science & Information Engineering, Hefei University of Technology, 230009, China

  • Venue:
  • GRC '12 Proceedings of the 2012 IEEE International Conference on Granular Computing (GrC-2012)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pattern matching with wildcards is very important in many fields such as information retrieval and bioinformatics. Suffix trees are used in pattern matching with variable length wildcards. But the construction of a suffix tree needs significant time and space overload. This paper presents a new pattern matching algorithm, PST, based on multiple suffix trees. The PST algorithm uses a cutting process to divide a string S into several parts firstly, and then establishes a suffix tree for each part of S respectively. If multiple patterns are to be retrieved, the suffix trees should be adjusted according to the cutting points: prefix sequence deletion and suffix sequence addition; prefix sequence addition and suffix sequence deletion. Theoretical analysis and experiments show that the PST algorithm can decrease the time and space overload than other peers.