Pivoting approaches for bulk extraction of Entity-Attribute-Value data

Authors:
Valentin Dinu;Prakash Nadkarni;Cynthia Brandt
Affiliations:
Center for Medical Informatics, Yale University School of Medicine, PO Box 208009, New Haven, CT 06520-8009, United States;Center for Medical Informatics, Yale University School of Medicine, PO Box 208009, New Haven, CT 06520-8009, United States;Center for Medical Informatics, Yale University School of Medicine, PO Box 208009, New Haven, CT 06520-8009, United States
Venue:
Computer Methods and Programs in Biomedicine
Year:
2006

Citing 1
Cited 5

Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions

Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions

Anatomy of data integration

Journal of Biomedical Informatics
Pivoted table index for querying product-property-value information

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
A statistical metadata model for clinical trials' data management

Computer Methods and Programs in Biomedicine
COVER model pivot view indexing for efficient XML data management

Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Integrating healthcare-related information using the entity-attribute-value storage model

HIS'12 Proceedings of the First international conference on Health Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Entity-Attribute-Value (EAV) data, as present in repositories of clinical patient data, must be transformed (pivoted) into one-column-per-parameter format before it can be used by a variety of analytical programs. Pivoting approaches have not been described in depth in the literature, and existing descriptions are dated. We describe and benchmark three alternative algorithms to perform pivoting of clinical data in the context of a clinical study data management system. We conclude that when the number of attributes to be returned is not too large, it is feasible to use static SQL as the basis for views on the data. An alternative but more complex approach that utilizes hash tables and the presence of abundant random-access-memory can achieve improved performance by reducing the load on the database server.