Designing Pooling Systems for Noisy High-Throughput Protein-Protein Interaction Experiments Using Boolean Compressed Sensing

  • Authors:
  • Ramy Mourad;Zaher Dawy;Faruck Morcos

  • Affiliations:
  • American University of Beirut, Beirut;American University of Beirut, Beirut;Rice University, Houston

  • Venue:
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Group testing, also known as pooling, is a common technique used in high-throughput experiments in molecular biology to significantly reduce the number of tests required to identify rare biological interactions while correcting for experimental noise. Central to the group testing problem are 1) a pooling design that lays out how items are grouped together into pools for testing and 2) a decoder that interprets the results of the tested pools, identifying the active compounds. In this work, we take advantage of decoder guarantees from the field of compressed sensing (CS) to address the problem of efficient and reliable detection of biological interaction in noisy high-throughput experiments. We also use efficient combinatorial algorithms from group testing as well as established measurement matrices from CS to create pooling designs. First, we formulate the group testing problem in terms of a Boolean CS framework. We then propose a low-complexity $(l_1)$-norm decoder to interpret pooling test results and identify active compounds. We demonstrate the robustness of the proposed $(l_1)$-norm decoder in simulated experiments with false-positive and false-negative error rates typical of high-throughput experiments. When benchmarked against the current state-of-the-art methods, the proposed $(l_1)$-norm decoder provides superior error correction for the majority of the cases considered while being notably faster computationally. Additionally, we test the performance of the $(l_1)$-norm decoder against a real experimental data set, where 12,675 prey proteins were screened against 12 bait proteins. Lastly, we study the impact of different sparse pooling design matrices on decoder performance and show that the shifted transversal design (STD) is the most suitable among the pooling designs surveyed for biological applications of CS.