An Efficient Uniform-Cost Normalized Edit Distance Algorithm

Authors:
Abdullah N. Arslan;Omer Egecioglu
Affiliations:
-;-
Venue:
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Year:
1999

Citing 0
Cited 8

A new approach to sequence comparison: normalized sequence alignment

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Learning Significant Alignments: An Alternative to Normalized Local Alignment

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
A Normalized Levenshtein Distance Metric

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Screening Method for Z-Value Assessment Based on the Normalized Edit Distance

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Fast Handover in Hierarchical Mobile IPv6 Based on Motion Pattern Detection of Mobile Node

Wireless Personal Communications: An International Journal
A multi-level framework for the analysis of sequential data

Data Mining
Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching

Information Processing and Management: an International Journal
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m = n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a non-negative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds require O(mn2) time in the worst-case. We give an O(mnlogn)-time algorithm for the problem when the cost function is uniform, i.e, the weight of each edit operation is constant within the same type, except substitutions can have different weights depending on whether they are matching or non-matching.