PDB-SQL: a storage engine for macromolecular data

  • Authors:
  • Edward E. Pryor, Jr.;Jacquelyn S. Fetrow

  • Affiliations:
  • Wake Forest University, Winston-Salem, NC;Wake Forest University, Winston-Salem, NC

  • Venue:
  • ACM-SE 45 Proceedings of the 45th annual southeast regional conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Protein Data Bank (PDB) was established in 1971 as a repository for macromolecular crystal structure data. Recent development of high throughput structural genomic technologies has produced massive quantities of data, and the amount of macromolecular structure data is increasing exponentially. The original format for these files was designed to be human-readable, rather than machine readable, and limited attention was played to standard vocabularies and data formats. It can be difficult to access these data for calculations in an efficient manner. This paper discusses the creation of PDB-SQL, a model database originally designed for the storage of alpha carbon coordinates and other types of information, of all protein structures in the PDB. We describe the architecture of this database and present data indicating the timing required to populate the database with all structures currently in the PDB. Comparison of storage requirements and time required to perform computational tasks are presented. Finally, we describe future development that would allow all macromolecular structure data to be stored in this database.