An optimal algorithm for maximum-sum segment and its application in bioinformatics

Authors:
Tsai-Hung Fan;Shufen Lee;Hsueh-I Lu;Tsung-Shan Tsou;Tsai-Cheng Wang;Adam Yao
Affiliations:
Institute of Statistics, National Central University, Chung-li, Taiwan, R.O.C.;Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C.;Institute of Information Science, Academia Sinica Taipei, Taiwan, R.O.C.;Institute of Statistics, National Central University, Chung-li, Taiwan, R.O.C.;Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C.;Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C.
Venue:
CIAA'03 Proceedings of the 8th international conference on Implementation and application of automata
Year:
2003

Citing 1
Cited 16

Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis

Journal of Computer and System Sciences - Computational biology 2002

Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint

Information Processing Letters
Improved algorithmms for the k maximum-sums problems

Theoretical Computer Science
Randomized algorithm for the sum selection problem

Theoretical Computer Science
A geometric framework for solving subsequence problems in computational biology efficiently

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
On the range maximum-sum segment query problem

Discrete Applied Mathematics
Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths

PEPM '08 Proceedings of the 2008 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Algorithms for finding the weight-constrained k longest paths in a tree and the length-constrained k maximum-sum segments of a sequence

Theoretical Computer Science
Optimal algorithms for the average-constrained maximum-sum segment problem

Information Processing Letters
Optimal algorithms for locating the longest and shortest segments satisfying a sum or an average constraint

Information Processing Letters
Algorithms for computing the length-constrained max-score segments with applications to DNA copy number data analysis

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Improved algorithms for the k maximum-sums problems

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
On the range maximum-sum segment query problem

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Disjoint segments with maximum density

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
An algorithm for a generalized maximum subsequence problem

LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Finding maximum sum segments in sequences with uncertainty

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Calculational developments of new parallel algorithms for size-constrained maximum-sum segment problems

FLOPS'12 Proceedings of the 11th international conference on Functional and Logic Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a fundamental sequence algorithm arising from bioinformatics. Given two integers L and U and a sequence A of n numbers, the maximum-sum segment problem is to find a segment A[i, j] of A with L ≤ j-i+1 ≤ U that maximizes A[i]+A[i+1]+...+A[j]. The problem finds applications in finding repeats, designing low complexity filter, and locating segments with rich C+G content for biomolecular sequences. The best known algorithm, due to Lin, Jiang, and Chao, runs in O(n) time, based upon a clever technique called left-negative decomposition for A. In the present paper, we present a new O(n)-time algorithm that bypasses the left-negative decomposition. As a result, our algorithm has the capability to handle the input sequence in an online manner, which is clearly an important feature to cope with genome-scale sequences. We also show how to exploit the sparsity in the input sequence: If A is representable in O(k) space in some format, then our algorithm runs in O(k) time. Moreover, practical implementation of our algorithm running on the rice genome helps us to identify a very long repeat structure in rice chromosome 1 that is previously unknown.