EMAGEN: an efficient approach to multiple whole genome alignment

Authors:
Jitender S. Deogun;Jingyi Yang;Fangrui Ma
Affiliations:
University of Nebraska-Lincoln, Lincoln, NE;University of Nebraska-Lincoln, Lincoln, NE;University of Nebraska-Lincoln, Lincoln, NE
Venue:
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Year:
2004

Citing 9
Cited 6

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Modular decomposition and transitive orientation

Discrete Mathematics - Special issue on partial ordered sets
On the vertex ranking problem for trapezoid, circular-arc and other graphs

Discrete Applied Mathematics
The Enhanced Suffix Array and Its Applications to Genome Analysis

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Fast and Sensitive Alignment of Large Genomic Sequences

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)

Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)
Fast lightweight suffix array construction and checking

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching

ViPER: augmenting automatic information extraction with visual perceptions

Proceedings of the 14th ACM international conference on Information and knowledge management
A chaining algorithm for mapping cDNA sequences to multiple genomic sequences

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
The solution space of genome sequence alignment and LIS graph decomposition

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Multiple genome alignment based on longest path in directed acyclic graphs

International Journal of Bioinformatics Research and Applications
Parallel chaining algorithms

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Efficient distributed computation of maximal exact matches

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

Following advances in biotechnology, many new whole genome sequences are becoming available every year. A lot of useful information can be derived from the alignment and comparison of different genomes. However, most of the current research focuses on pairwise genome alignment, and only a few available applications can efficiently align multiple genomes. In this paper, we present an efficient approach to align closely related multiple whole genomes, combining suffix arrays, graph theoretic formulation and existing tools for gap (short sequence) alignment. Our approach first finds a maximum set of aligned conserved regions among multiple whole genomes, then aligns the gaps between consecutive conserved regions with Clustal W. We present two methods to find the maximum set of aligned conserved regions among whole genomes. In first method, called Direct Matching (DM), multiple whole genomes are aligned with their DNA sequences. However, because most parts of prokaryotic genomes are encoded regions, we introduce second method, Functional Matching (FM), to especially align multiple prokaryotic genomes with their concatenated protein sequences. We present experimental results for both methods and give the analysis of the results. The FM method generates much better results for less closely related prokaryotic genomes than DM method. It outputs more and longer conserved regions, which conveys more accurate and detailed information about the conservation and inheritance of genomes, and generates more detailed alignments.