A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data

Authors:
Theodore J. Perkins;Michael T. Hallett
Affiliations:
McGill University, Montreal;McGill University, Montreal
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 10
Cited 1

Exploiting random walks for learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Vertex cover: further observations and further improvements

Journal of Algorithms
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Approximating Minimum Keys and Optimal Substructure Screens

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
An efficient fixed-parameter algorithm for 3-hitting set

Journal of Discrete Algorithms
Tight Lower Bounds for Certain Parameterized NP-Hard Problems

CCC '04 Proceedings of the 19th IEEE Annual Conference on Computational Complexity
Learning functions of k relevant variables

Journal of Computer and System Sciences - Special issue: STOC 2003
Learning DNF from random walks

Journal of Computer and System Sciences - Special issue: Learning theory 2003
Performance analysis of a greedy algorithm for inferring Boolean functions

Information Processing Letters

A Constrained Evolutionary Computation Method for Detecting Controlling Regions of Cortical Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A key problem in molecular biology is to infer regulatory relationships between genes from expression data. This paper studies a simplified model of such inference problems in which one or more Boolean variables, modeling, for example, the expression levels of genes, each depend deterministically on a small but unknown subset of a large number of Boolean input variables. Our model assumes that the expression data comprises a time series, in which successive samples may be correlated. We provide bounds on the expected amount of data needed to infer the correct relationships between output and input variables. These bounds improve and generalize previous results for Boolean network inference and continuous-time switching network inference. Although the computational problem is intractable in general, we describe a fixed-parameter tractable algorithm that is guaranteed to provide at least a partial solution to the problem. Most interestingly, both the sample complexity and computational complexity of the problem depend on the strength of correlations between successive samples in the time series but in opposing ways. Uncorrelated samples minimize the total number of samples needed while maximizing computational complexity; a strong correlation between successive samples has the opposite effect. This observation has implications for the design of experiments for measuring gene expression.