Picking alignments from (steiner) trees

Authors:
Lior Pachter;Fumei Lam
Affiliations:
U.C. Berkeley, Berkeley, CA;M.I.T., Cambridge, MA
Venue:
Proceedings of the sixth annual international conference on Computational biology
Year:
2002

Citing 8
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Fast and numerically stable parametric alignment of biosequences

RECOMB '97 Proceedings of the first annual international conference on Computational molecular biology
Chaining multiple-alignment fragments in sub-quadratic time

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
A new approach to sequence comparison: normalized sequence alignment

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Determining DNA Sequence Similarity Using Maximum Independent Set Algorithms for Interval Graphs

SWAT '92 Proceedings of the Third Scandinavian Workshop on Algorithm Theory
Heuristics for minimum edge length rectangular partitions of rectilinear figures

Proceedings of the 6th GI-Conference on Theoretical Computer Science
Approximating a minimum Manhattan network

Nordic Journal of Computing
The Directed Steiner Network Problem is Tractable for a Constant Number of Terminals

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application of Needleman-Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman-Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated non-conserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, succesfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.