An optimal algorithm for maximum-sum segment and its application in bioinformatics

  • Authors:
  • Tsai-Hung Fan;Shufen Lee;Hsueh-I Lu;Tsung-Shan Tsou;Tsai-Cheng Wang;Adam Yao

  • Affiliations:
  • Institute of Statistics, National Central University, Chung-li, Taiwan, R.O.C.;Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C.;Institute of Information Science, Academia Sinica Taipei, Taiwan, R.O.C.;Institute of Statistics, National Central University, Chung-li, Taiwan, R.O.C.;Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C.;Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C.

  • Venue:
  • CIAA'03 Proceedings of the 8th international conference on Implementation and application of automata
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study a fundamental sequence algorithm arising from bioinformatics. Given two integers L and U and a sequence A of n numbers, the maximum-sum segment problem is to find a segment A[i, j] of A with L ≤ j-i+1 ≤ U that maximizes A[i]+A[i+1]+...+A[j]. The problem finds applications in finding repeats, designing low complexity filter, and locating segments with rich C+G content for biomolecular sequences. The best known algorithm, due to Lin, Jiang, and Chao, runs in O(n) time, based upon a clever technique called left-negative decomposition for A. In the present paper, we present a new O(n)-time algorithm that bypasses the left-negative decomposition. As a result, our algorithm has the capability to handle the input sequence in an online manner, which is clearly an important feature to cope with genome-scale sequences. We also show how to exploit the sparsity in the input sequence: If A is representable in O(k) space in some format, then our algorithm runs in O(k) time. Moreover, practical implementation of our algorithm running on the rice genome helps us to identify a very long repeat structure in rice chromosome 1 that is previously unknown.