Improving the diagnosis of mild hypertrophic cardiomyopathy with MapReduce

Authors:
Pantazis Deligiannis;Hans-Wolfgang Loidl;Evangelia Kouidi
Affiliations:
Heriot-Watt University, Edinburgh, United Kingdom;Heriot-Watt University, Edinburgh, United Kingdom;Aristotle University of Thessaloniki, Thessaloniki, Greece
Venue:
Proceedings of third international workshop on MapReduce and its Applications Date
Year:
2012

Citing 11
Cited 0

Machine Learning

Machine Learning
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
CloudBurst

Bioinformatics
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
Large-scale multimodal mining for healthcare with mapreduce

Proceedings of the 1st ACM International Health Informatics Symposium
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Full-text indexing for optimizing selection operations in large-scale data analytics

Proceedings of the second international workshop on MapReduce and its applications
Comparing high level mapreduce query languages

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Better medicine through machine learning

Communications of the ACM

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hypertrophic Cardiomyopathy (HCM), an inherited heart disease, is the most common cause of sudden cardiac death in young athletes. Successful diagnosis of mild HCM presents a major medical challenge, especially in athletes with exercise-induced hypertrophy that overlaps with HCM. This is due to a wide spectrum of non-specific clinical parameters and their complex dependencies. Recently, medical researchers proposed multidisciplinary strategies, defining differential diagnostic scoring algorithms, with the goal of identifying which parameters correlate with HCM in order to achieve faster and more accurate diagnosis. These algorithms require extensive testing against large medical datasets in order to identify potential correlations, and assess the overall algorithmic quality and diagnostic accuracy. We present a prototype data-parallel algorithm for improving the diagnosis of mild HCM, by refining the set of parameters contributing to the main diagnostic function. To this end, we employ a rule-based, machine-learning approach and develop an iterative MapReduce application for applying the diagnostic function on large data-sets. The core component of the algorithm, including the diagnostic function, has been implemented in Java, Pig and Hive in order to identify potential productivity gains by using a high-level MapReduce language specifically for medical applications. Finally, we assess the algorithmic performance on up to 64 cores of our Hadoop (version 0.20.1) enabled Beowulf cluster, managing to achieve near-linear speedups while reducing the overall runtime from over 9 hours to a couple of minutes for a realistic dataset of 10,000 medical records.