A New Efficient Algorithm for the Gene-Team Problem on General Sequences

Authors:
Biing-Feng Wang;Chung-Chin Kuo;Shang-Ju Liu;Chien-Hsin Lin
Affiliations:
National Tsing Hua University, Hsinchu;National Tsing Hua University, Hsinchu;National Tsing Hua University, Hsinchu;National Tsing Hua University, Hsinchu
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 7
Cited 1

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding All Common Intervals of k Permutations

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
An algorithmic view of gene teams

Theoretical Computer Science
Detecting gene clusters under evolutionary constraint in a large number of genomes

Bioinformatics
Improved Algorithms for Finding Gene Teams and Constructing Gene Team Trees

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Integer linear programs for discovering approximate gene clusters

WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Software note: Gene teams: a new formalization of gene clusters for comparative genomics

Computational Biology and Chemistry

Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying conserved gene clusters is an important step toward understanding the evolution of genomes and predicting the functions of genes. A famous model to capture the essential biological features of a conserved gene cluster is called the gene-team model. The problem of finding the gene teams of two general sequences is the focus of this paper. For this problem, He and Goldwasser had an efficient algorithm that requires O(mn) time using O(m + n) working space, where m and n are, respectively, the numbers of genes in the two given sequences. In this paper, a new efficient algorithm is presented. Assume m \le n. Let C = \sum _{\alpha \in \Sigma } o_{1}(\alpha )o_{2}(\alpha ), where \Sigma is the set of distinct genes, and o_{1}(\alpha ) and o_{2}(\alpha ) are, respectively, the numbers of copies of α in the two given sequences. Our new algorithm requires O({\rm min}\{C{\rm lg}n, mn\}) time using O(m + n) working space. As compared with He and Goldwasser's algorithm, our new algorithm is more practical, as C is likely to be much smaller than mn in practice. In addition, our new algorithm is output sensitive. Its running time is O({\rm lg}n) times the size of the output. Moreover, our new algorithm can be efficiently extended to find the gene teams of k general sequences in O(kClg(n_{1}n_{2} \ldots n_{k})) time, where n_i is the number of genes in the ith input sequence.