PDB-SQL: a storage engine for macromolecular data

Authors:
Edward E. Pryor, Jr.;Jacquelyn S. Fetrow
Affiliations:
Wake Forest University, Winston-Salem, NC;Wake Forest University, Winston-Salem, NC
Venue:
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Year:
2007

Citing 4
Cited 1

A relational model of data for large shared data banks

Communications of the ACM
PDBML: the representation of archival macromolecular structure data in XML

Bioinformatics
Automated discovery of 3D motifs for protein function annotation

Bioinformatics
ProtBuD: a database of biological unit structures of protein families and superfamilies

Bioinformatics

PH2: an hadoop-based framework for mining structural properties from the PDB database

SAICSIT '10 Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Protein Data Bank (PDB) was established in 1971 as a repository for macromolecular crystal structure data. Recent development of high throughput structural genomic technologies has produced massive quantities of data, and the amount of macromolecular structure data is increasing exponentially. The original format for these files was designed to be human-readable, rather than machine readable, and limited attention was played to standard vocabularies and data formats. It can be difficult to access these data for calculations in an efficient manner. This paper discusses the creation of PDB-SQL, a model database originally designed for the storage of alpha carbon coordinates and other types of information, of all protein structures in the PDB. We describe the architecture of this database and present data indicating the timing required to populate the database with all structures currently in the PDB. Comparison of storage requirements and time required to perform computational tasks are presented. Finally, we describe future development that would allow all macromolecular structure data to be stored in this database.