Pattern matching with wildcards using words of shorter length

  • Authors:
  • Meng Zhang;Yi Zhang;Liang Hu

  • Affiliations:
  • College of Computer Science and Technology, Jilin University, Changchun, China;Department of Computer Science, Jilin Business and Technology College, Changchun, China and College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China

  • Venue:
  • Information Processing Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.89

Visualization

Abstract

The problem of pattern matching with wildcards is to find all the occurrences of a pattern of length m in a text of length n over a finite alphabet @S (both the text and the pattern are allowed to contain wildcards). Based on the prime number encoding scheme (Chaim Linhart, Ron Shamir, Faster pattern matching with character classes using prime number encoding, J. Comput. Syst. Sci. 75 (3) (2009) 155-162), we present a new integer encoding and an efficient fast Fourier transforms based algorithm for this problem. The algorithm takes O(nlogm) time to search the pattern in the text by computing one convolution. For matching with wildcards, our encoding uses fewer prime numbers and has shorter code words comparing with the prime number encoding. We use at most 2lg|@S| prime numbers to encode the symbols while in the prime number encoding |@S| prime numbers are required. This number reduces to 1.5lg|@S| when |@S|40. The code word used in the algorithm is at most 2@?lg|@S|@?@?lg(5m)@? bits while in the prime encoding it is at least |@S|lgm bits. We also show that the length of words can be further reduced by increasing the number of convolutions computed.