Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences

Authors:
Biing-Feng Wang
Affiliations:
National Tsing Hua University, Hsinchu
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 11
Cited 0

Computational geometry: an introduction

Computational geometry: an introduction
Introduction to Algorithms

Introduction to Algorithms
Finding All Common Intervals of k Permutations

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
An algorithmic view of gene teams

Theoretical Computer Science
Character sets of strings

Journal of Discrete Algorithms
Efficient computation of approximate gene clusters based on reference occurrences

RECOMB-CG'10 Proceedings of the 2010 international conference on Comparative genomics
Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The incompatible desiderata of gene cluster properties

RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
Integer linear programs for discovering approximate gene clusters

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
A New Efficient Algorithm for the Gene-Team Problem on General Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Software note: Gene teams: a new formalization of gene clusters for comparative genomics

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

The focus of this paper is the problem of finding all nested common intervals of two general sequences. Depending on the treatment one wants to apply to duplicate genes, Blin et al. introduced three models to define nested common intervals of two sequences: the uniqueness, the free-inclusion, and the bijection models. We consider all the three models. For the uniqueness and the bijection models, we give O(n + N_{\rm out})-time algorithms, where N_{\rm out} denotes the size of the output. For the free-inclusion model, we give an O(n^{1 + \varepsilon } + N_{{\rm out}})-time algorithm, where \varepsilon 0 is an arbitrarily small constant. We also present an upper bound on the size of the output for each model. For the uniqueness and the free-inclusion models, we show that N_{\rm out}=O(n^{2}). Let C = \sum _{g \in \Gamma } o_{1}(g)o_{2}(g), where \Gamma is the set of distinct genes, and o_{1}(g) and o_{2}(g) are, respectively, the numbers of copies of g in the two given sequences. For the bijection model, we show that N_{\rm out}=O(Cn). In this paper, we also study the problem of finding all approximate nested common intervals of two sequences on the bijection model. An O(\delta n + N_{{\rm out}})-time algorithm is presented, where \delta denotes the maximum number of allowed gaps. In addition, we show that for this problem N_{\rm out} is O(\delta n^{3}).