Approximate string matching by position restricted alignment

  • Authors:
  • Manish Patil;Xuanting Cai;Sharma V. Thankachan;Rahul Shah;Seung-Jong Park;David Foltz

  • Affiliations:
  • Louisiana State University;Louisiana State University;Louisiana State University;Louisiana State University;Louisiana State University;Louisiana State University

  • Venue:
  • Proceedings of the Joint EDBT/ICDT 2013 Workshops
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a collection of strings, goal of the approximate string matching is to efficiently find the strings in the collection that are similar to a query string. In this paper, we focus on edit distance as measure to quantify the similarity between two strings. Existing q-gram based methods to address this problem use inverted indexes to index the q-grams of given string collection. These methods begin by generating the q-grams of query string (disjoint or overlapping) and then merge the inverted lists of these q-grams. Several filtering techniques have been proposed so as to segment inverted lists to relatively shorter lists thus reducing the merging cost. We use a filtering technique which we call as "position restricted alignment" that combines well known length filtering and position filtering to provide more aggressive pruning. We then provide an indexing scheme that integrates the inverted lists storage with the proposed filter thus enabling us to auto-filter the inverted lists. We evaluate the effectiveness of the proposed approach by thorough experimentation.