A linear time biclustering algorithm for time series gene expression data

Authors:
Sara C. Madeira;Arlindo L. Oliveira
Affiliations:
INESC-ID, Lisbon, Portugal;INESC-ID, Lisbon, Portugal
Venue:
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Year:
2005

Citing 8
Cited 5

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
The maximum edge biclique problem is NP-complete

Discrete Applied Mathematics
Biclustering in Gene Expression Data by Tendency

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Identifying time-lagged gene clusters using gene expression data

Bioinformatics

Evolutionary biclustering of gene expressions

Ubiquity
Gene interaction - An evolutionary biclustering approach

Information Fusion
Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Evolutionary biclustering with correlation for gene interaction networks

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
An effective measure for assessing the quality of biclusters

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors. In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources. In this work, we examine a particular setting of the problem, where we are concerned with finding biclusters in time series expression data. In this context, we are interested in finding biclusters with consecutive columns. For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix. This complexity is obtained by manipulating a discretized version of the matrix and by using string processing techniques based on suffix trees. We report results in both synthetic and real data that show the effectiveness of the approach.