The String-to-String Correction Problem
Journal of the ACM (JACM)
Computation of Normalized Edit Distance and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Computation of Normalized Edit Distances
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Efficient Uniform-Cost Normalized Edit Distance Algorithm
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Hi-index | 0.00 |
Pairwise global alignment scores are used to detect related sequences in genome and proteins. These scores are biased by the length and composition of the compared sequences, and the Z-value is used to estimate their statistical significance. The Z-value is computed using a Monte Carlo algorithm that requires a large number of pairwise alignments between random permutations of the sequences compared. A different alignment score, the normalized edit distance , is independent of the sequence lengths, and it usually takes 2 or 3 standard alignment calculations. In this paper we study the relationship between the normalized edit distance and the Z-value, and propose a method to screen pairs of unrelated sequences, so that Z-value needs to be computed for a small percentage of sequence pairs. We apply this method to the comparison of proteins from Saccharomyces cerevisiae , Escherichia coli , Methanococcus jannaschii and Haemophilus influenzae , showing that Z-value has to be computed for less than 1% of all protein pairs.