Relational operators for prioritizing candidate biomarkers in high-throughput differential expression data

Authors:
Getiria Onsongo;Hongwei Xie;Timothy J. Griffin;John V. Carlis
Affiliations:
SE, Minneapolis, MN;Mol Biology and Biophysics, Minneapolis, MN;Mol Biology, Minneapolis, MN;SE, Minneapolis, MN
Venue:
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Year:
2010

Citing 10
Cited 0

Extensions to Query Languages for Graph Traversal Problems

IEEE Transactions on Knowledge and Data Engineering
PiQA: an algebra for querying protein data sets

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
A query language for biological networks

Bioinformatics
Discovering disease-genes by topological features in human protein--protein interaction network

Bioinformatics
Graph data management for molecular and cell biology

IBM Journal of Research and Development - Systems biology
Periscope/SQ: interactive exploration of biological sequence databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Generating GO Slim Using Relational Database Management Systems to Support Proteomics Analysis

CBMS '08 Proceedings of the 2008 21st IEEE International Symposium on Computer-Based Medical Systems
Managing Biological Data using bdbms

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Evaluating Reachability Queries over Path Collections

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
PathGen

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent developments in high-throughput proteomics technologies have made it possible to detect and identify low abundance proteins. These technologies provide a new window through which proteomes can be analyzed. Despite holding great promise, the contribution of mass spectrometry based proteomics in identifying novel diagnostic biomarkers has been disappointing. This failure has, in part, been attributed to the lack of effective strategies for determining candidate biomarkers that justify more expensive and time-consuming validation studies. An approach that bridges the gap between unbiased experimental paradigm emphasizing comprehensive characterizations of proteins and a candidate-driven paradigm would overcome this limitation [38]. To this end, we have developed database operators that extend the database management systems to analyze high-throughput proteomics and genomics data. By analyzing differentially expressed genes and proteins using pathway databases, these operators take advantage of established expert domain knowledge in pathway annotation to prioritize candidate biomarkers. They provide a systematic way of bridging the gap between unbiased experimental paradigm and candidate-driven paradigm. To test the operators, we analyzed a dataset of salivary proteins differentially expressed between pre-malignant and malignant oral lesions. Six proteins are identified as candidate biomarkers worth of validation studies. A literature search reveals these high priorit candidate biomarkers interact with proteins implicated in cancer development highlighting their potential utility as biomarkers demonstrating the effectiveness of our operators. The developed operators will help overcome one of the main challenges of high-throughput computational techniques; provide a systematic way of bridging the gap between unbiased data driven approach and hypothesis driven approach to prioritize candidate biomarkers worth of more expensive and time consuming validation studies.