Large-scale multimodal mining for healthcare with mapreduce

Authors:
Fei Wang;Vuk Ercegovac;Tanveer Syeda-Mahmood;Akintayo Holder;Eugene Shekita;David Beymer;Lin Hao Xu
Affiliations:
IBM Research Almaden, San Jose, CA, USA;IBM Research Almaden, San Jose, CA, USA;IBM Research Almaden, San Jose, CA, USA;RPI, Troy, NY, USA;IBM Research Almaden, San Jose, CA, USA;IBM Research Almaden, San Jose, CA, USA;IBM Research China, Beijing, China
Venue:
Proceedings of the 1st ACM International Health Informatics Symposium
Year:
2010

Citing 5
Cited 2

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pairwise document similarity in large collections with MapReduce

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Information Extraction from Multimodal ECG Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Characterizing spatio-temporal patterns for disease discrimination in cardiac echo videos

MICCAI'07 Proceedings of the 10th international conference on Medical image computing and computer-assisted intervention - Volume Part I

Improving the diagnosis of mild hypertrophic cardiomyopathy with MapReduce

Proceedings of third international workshop on MapReduce and its Applications Date
Offloading work to mobile devices: an availability-aware data partitioning approach

Proceedings of the First International Workshop on Middleware for Cloud-enabled Sensing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in healthcare and bioscience technologies and proliferation of portable medical devices have produce massive amount of multimodal data, the need for parallel processing is apparent for mining these data sets, which can range anywhere from tens of gigabytes, to terabytes or even petabytes. AALIM (Advanced Analytics for Information Management) is a new multimodal mining-based clinical decision support system that brings together patient data captured in many modalities to provide a holistic presentation of a patient's exam data, diseases, and medications. In addition, it offers disease-specific similarity search based on the various data modalities. The current deployed AALIM system is only able to process limited amount of patient data per day. In this paper, we attempt to address this challenge of building a healthcare multimodal mining system on top of the MapReduce framework, specifically its popular open-source implementation, Hadoop. We presented a scalable and generic framework that enables automatic parallelization of the healthcare multimodal mining algorithm, and distribution of large-scale computation that achieves high performance on clusters of commodity servers. Initial testing of importing a single AALIM module (EKG period estimation) using Hadoop on a cluster of servers shows very promising results.