Efficient data management in a large-scale epidemiology research project

  • Authors:
  • Jens Meyer;Stefan Ostrzinski;Daniel Fredrich;Christoph Havemann;Janina Krafczyk;Wolfgang Hoffmann

  • Affiliations:
  • Institute of Community Medicine, Section Epidemiology of Health Care and Community Health, Ellernholzstr. 1-2, 17487 Greifswald, Germany;Institute of Community Medicine, Section Epidemiology of Health Care and Community Health, Ellernholzstr. 1-2, 17487 Greifswald, Germany;Institute of Community Medicine, Section Epidemiology of Health Care and Community Health, Ellernholzstr. 1-2, 17487 Greifswald, Germany;Institute of Community Medicine, Section Epidemiology of Health Care and Community Health, Ellernholzstr. 1-2, 17487 Greifswald, Germany;Institute of Community Medicine, Section Epidemiology of Health Care and Community Health, Ellernholzstr. 1-2, 17487 Greifswald, Germany;Institute of Community Medicine, Section Epidemiology of Health Care and Community Health, Ellernholzstr. 1-2, 17487 Greifswald, Germany

  • Venue:
  • Computer Methods and Programs in Biomedicine
  • Year:
  • 2012

Quantified Score

Hi-index 0.02

Visualization

Abstract

This article describes the concept of a ''Central Data Management'' (CDM) and its implementation within the large-scale population-based medical research project ''Personalized Medicine''. The CDM can be summarized as a conjunction of data capturing, data integration, data storage, data refinement, and data transfer. A wide spectrum of reliable ''Extract Transform Load'' (ETL) software for automatic integration of data as well as ''electronic Case Report Forms'' (eCRFs) was developed, in order to integrate decentralized and heterogeneously captured data. Due to the high sensitivity of the captured data, high system resource availability, data privacy, data security and quality assurance are of utmost importance. A complex data model was developed and implemented using an Oracle database in high availability cluster mode in order to integrate different types of participant-related data. Intelligent data capturing and storage mechanisms are improving the quality of data. Data privacy is ensured by a multi-layered role/right system for access control and de-identification of identifying data. A well defined backup process prevents data loss. Over the period of one and a half year, the CDM has captured a wide variety of data in the magnitude of approximately 5terabytes without experiencing any critical incidents of system breakdown or loss of data. The aim of this article is to demonstrate one possible way of establishing a Central Data Management in large-scale medical and epidemiological studies.