Vector representations for efficient comparison and search for similar strings

  • Authors:
  • A. M. Sokolov

  • Affiliations:
  • International Scientific-Educational Center of Information Technologies and Systems, Kiev, Ukraine

  • Venue:
  • Cybernetics and Systems Analysis
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A method is proposed for approximation of the classic edit distance between strings. The method is based on a mapping of strings into vectors belonging to a space with an easily calculable metric. The method preserves the closeness of strings and makes it possible to accelerate the computation of edit distances. The developed q-gram method of approximation of edit distances and its two randomized versions improves the approximation quality in comparison with well-known results.