Improving the diagnosis of mild hypertrophic cardiomyopathy with MapReduce

  • Authors:
  • Pantazis Deligiannis;Hans-Wolfgang Loidl;Evangelia Kouidi

  • Affiliations:
  • Heriot-Watt University, Edinburgh, United Kingdom;Heriot-Watt University, Edinburgh, United Kingdom;Aristotle University of Thessaloniki, Thessaloniki, Greece

  • Venue:
  • Proceedings of third international workshop on MapReduce and its Applications Date
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hypertrophic Cardiomyopathy (HCM), an inherited heart disease, is the most common cause of sudden cardiac death in young athletes. Successful diagnosis of mild HCM presents a major medical challenge, especially in athletes with exercise-induced hypertrophy that overlaps with HCM. This is due to a wide spectrum of non-specific clinical parameters and their complex dependencies. Recently, medical researchers proposed multidisciplinary strategies, defining differential diagnostic scoring algorithms, with the goal of identifying which parameters correlate with HCM in order to achieve faster and more accurate diagnosis. These algorithms require extensive testing against large medical datasets in order to identify potential correlations, and assess the overall algorithmic quality and diagnostic accuracy. We present a prototype data-parallel algorithm for improving the diagnosis of mild HCM, by refining the set of parameters contributing to the main diagnostic function. To this end, we employ a rule-based, machine-learning approach and develop an iterative MapReduce application for applying the diagnostic function on large data-sets. The core component of the algorithm, including the diagnostic function, has been implemented in Java, Pig and Hive in order to identify potential productivity gains by using a high-level MapReduce language specifically for medical applications. Finally, we assess the algorithmic performance on up to 64 cores of our Hadoop (version 0.20.1) enabled Beowulf cluster, managing to achieve near-linear speedups while reducing the overall runtime from over 9 hours to a couple of minutes for a realistic dataset of 10,000 medical records.