Improved alignment of protein sequences based on common parts
ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Relevant and Non-Redundant Amino Acid Sequence Selection for Protein Functional Site Identification
International Journal of Software Science and Computational Intelligence
Hi-index | 3.85 |
Motivation: It is widely recognized that homology search and ortholog clustering are very useful for analyzing biological sequences. However, recent growth of sequence database size makes homolog detection difficult, and rapid and accurate methods are required. Results: We present a novel method for fast and accurate homology detection, assuming that the Smith--Waterman (SW) scores between all similar sequence pairs in a target database are computed and stored. In this method, SW alignment is computed only if the upper bound, which is derived from our novel inequality, is higher than the given threshold. In contrast to other methods such as FASTA and BLAST, this method is guaranteed to find all sequences whose scores against the query are higher than the specified threshold. Results of computational experiments suggest that the method is dozens of times faster than SSEARCH if genome sequence data of closely related species are available. Availability: The programs for fast homolog detection can be downloaded from ftp://ftp.kuicr.kyoto-u.ac.jp/itoh/ Contact: itoh@kuicr.kyoto-u.ac.jp