Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis

Authors:
Yaw-Ling Lin;Tao Jiang;Kun-Mao Chao
Affiliations:
-;-;-
Venue:
MFCS '02 Proceedings of the 27th International Symposium on Mathematical Foundations of Computer Science
Year:
2002

Citing 4
Cited 3

Programming pearls

Programming pearls
An efficient algorithm for the length-constrained heaviest path problem on a tree

Information Processing Letters
Algorithms for Local Alignment with Length Constraints

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
The Conserved Exon Method for Gene Finding

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

Linear-time algorithm for finding a maximum-density segment of a sequence

Information Processing Letters
Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths

PEPM '08 Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Fast and space-efficient location of heavy or dense segments in run-length encoded sequences

COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study two fundamental problems concerning the search for interesting regions in sequences: (i) given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum and (ii) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We present an O(n)- time algorithm for the first problem and an O(n log L)-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions.