Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Fast and Sensitive Alignment of Large Genomic Sequences
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
High-Performance Direct Pairwise Comparison of Large Genomic Sequences
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
In this paper, we present a simple and efficient whole genome alignment method using maximal exact match (MEM). The major problem with the use of MEM anchor is that the number of hits in non-homologous regions increases exponentially when shorter MEM anchors are used to detect more homologous regions. To deal with this problem, we have developed a fast and accurate anchor filtering scheme based on simple match extension with minimum percent identity and extension length criteria. Due to its simplicity and accuracy, all MEM anchors in a pair of genomes can be exhaustively tested and filtered. In addition, by incorporating the translation technique, the alignment quality and speed of our genome alignment algorithm have been further improved. As a result, our genome alignment algorithm, GAME (Genome Alignment by Match Extension), performs competitively over existing algorithms and can align large whole genomes, e.g., A. thaliana, without the requirement of typical large memory and parallel processors. This is shown using an experiment which compares the performance of BLAST, BLASTZ, PatternHunter, MUMmer and our algorithm in aligning all 45 pairs of 10 microbial genomes. The scalability of our algorithm is shown in another experiment where all pairs of five chromosomes in A. thaliana were compared.