Flexible and efficient string similarity search with alignment-space transform

  • Authors:
  • Sung-Hwan Kim;Jong-Kyu Seo;Hwan-Gue Cho

  • Affiliations:
  • Pusan National University, Busan, South Korea;Pusan National University, Busan, South Korea;Pusan National University, Busan, South Korea

  • Venue:
  • Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The string similarity search is the problem of finding similar strings in a given database. Throughout computer engineering, this problem has a number of applications, such as spelling correction, spam filters, and information retrieval. Among the various solutions to this problem, we focus on the distance-space transform, which uses well-known multidimensional spatial data structures such as kD-trees and R*-trees for indexing. This maps strings into k-dimensional vectors whose components are the distances from preselected reference objects (called pivots). In this paper, we further develop the distance-space transform into a more general filtering framework. Based on this framework, we also present an alignment-space transform as an extension of the distance-space transform. Through experiments, we demonstrate the search performance of our proposed method with respect to a variety of search range parameters and pivot selection strategies.