Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks

  • Authors:
  • Martin Hopfensitz;Christoph Mussel;Christian Wawra;Markus Maucher;Michael Kuhl;Heiko Neumann;Hans A. Kestler

  • Affiliations:
  • University Hospital Ulm, to Ulm University, Ulm,;Ulm University, Ulm;Ulm University, Ulm;University Hospital Ulm, to Ulm University, Ulm,;Ulm University, Ulm;Ulm University, Ulm;University Hospital Ulm, to Ulm University, Ulm,

  • Venue:
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Network inference algorithms can assist life scientists in unraveling gene-regulatory systems on a molecular level. In recent years, great attention has been drawn to the reconstruction of Boolean networks from time series. These need to be binarized, as such networks model genes as binary variables (either "expressed” or "not expressed”). Common binarization methods often cluster measurements or separate them according to statistical or information theoretic characteristics and may require many data points to determine a robust threshold. Yet, time series measurements frequently comprise only a small number of samples. To overcome this limitation, we propose a binarization that incorporates measurements at multiple resolutions. We introduce two such binarization approaches which determine thresholds based on limited numbers of samples and additionally provide a measure of threshold validity. Thus, network reconstruction and further analysis can be restricted to genes with meaningful thresholds. This reduces the complexity of network inference. The performance of our binarization algorithms was evaluated in network reconstruction experiments using artificial data as well as real-world yeast expression time series. The new approaches yield considerably improved correct network identification rates compared to other binarization techniques by effectively reducing the amount of candidate networks.