A statistical framework for combining and interpreting proteomic datasets

Authors:
Michael A. Gilchrist;Laura A. Salter;Andreas Wagner
Affiliations:
Department of Biology;Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87106, USA;Department of Biology
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 7

Estimating and Improving Protein Interaction Error Rates

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Discover True Association Rates in Multi-protein Complex Proteomics Data Sets

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Integration of Full-Coverage Probabilistic Functional Networks with Relevance to Specific Biological Processes

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Reconstructing the topology of protein complexes

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Pairwise local alignment of protein interaction networks guided by models of evolution

RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Detecting protein complexes from noisy protein interaction data

Proceedings of the 11th International Workshop on Data Mining in Bioinformatics
Clustering Coefficients in Protein Interaction Hypernetworks

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: To identify accurately protein function on a proteome-wide scale requires integrating data within and between high-throughput experiments. High-throughput proteomic datasets often have high rates of errors and thus yield incomplete and contradictory information. In this study, we develop a simple statistical framework using Bayes' law to interpret such data and combine information from different high-throughput experiments. In order to illustrate our approach we apply it to two protein complex purification datasets. Results: Our approach shows how to use high-throughput data to calculate accurately the probability that two proteins are part of the same complex. Importantly, our approach does not need a reference set of verified protein interactions to determine false positive and false negative error rates of protein association. We also demonstrate how to combine information from two separate protein purification datasets into a combined dataset that has greater coverage and accuracy than either dataset alone. In addition, we also provide a technique for estimating the total number of proteins which can be detected using a particular experimental technique. Availability: A suite of simple programs to accomplish some of the above tasks is available at www.unm.edu/~compbio/software/DatasetAssess