A Screening Method for Z-Value Assessment Based on the Normalized Edit Distance

  • Authors:
  • Guillermo Peris;Andrés Marzal

  • Affiliations:
  • Universitat Jaume I (Castelló), Spain;Universitat Jaume I (Castelló), Spain

  • Venue:
  • IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pairwise global alignment scores are used to detect related sequences in genome and proteins. These scores are biased by the length and composition of the compared sequences, and the Z-value is used to estimate their statistical significance. The Z-value is computed using a Monte Carlo algorithm that requires a large number of pairwise alignments between random permutations of the sequences compared. A different alignment score, the normalized edit distance , is independent of the sequence lengths, and it usually takes 2 or 3 standard alignment calculations. In this paper we study the relationship between the normalized edit distance and the Z-value, and propose a method to screen pairs of unrelated sequences, so that Z-value needs to be computed for a small percentage of sequence pairs. We apply this method to the comparison of proteins from Saccharomyces cerevisiae , Escherichia coli , Methanococcus jannaschii and Haemophilus influenzae , showing that Z-value has to be computed for less than 1% of all protein pairs.