A Two-Tire Index Structure for Approximate String Matching with Block Moves

  • Authors:
  • Bin Wang;Long Xie;Guoren Wang

  • Affiliations:
  • Key Laboratory of Medical Image Computing (Northeastern University), Ministry of Education, and School of Information Science and Engineering, Northeastern University, Shenyang, China;Information School, Liaoning University, Shenyang, China;Key Laboratory of Medical Image Computing (Northeastern University), Ministry of Education, and School of Information Science and Engineering, Northeastern University, Shenyang, China

  • Venue:
  • Database Systems for Advanced Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many applications need to solve the problem of approximate string matching with block moves. It is an NP-Complete problem to compute block edit distance between two strings. Our goal is to filter non-candidate strings as much as possible. Based on the two matured filter strategies, frequency distance and positional q-gram, we propose a two-tire index structure to make the use of the two filters more efficiently. We give a full specification of the index structure, including how to choose character order to achieve a better filterability and how to balance number of strings in different clusters. We present our experiments on real data sets to evaluate our technique and show the proposed index structure can provide a good performance.