Large-scale multimodal mining for healthcare with mapreduce

  • Authors:
  • Fei Wang;Vuk Ercegovac;Tanveer Syeda-Mahmood;Akintayo Holder;Eugene Shekita;David Beymer;Lin Hao Xu

  • Affiliations:
  • IBM Research Almaden, San Jose, CA, USA;IBM Research Almaden, San Jose, CA, USA;IBM Research Almaden, San Jose, CA, USA;RPI, Troy, NY, USA;IBM Research Almaden, San Jose, CA, USA;IBM Research Almaden, San Jose, CA, USA;IBM Research China, Beijing, China

  • Venue:
  • Proceedings of the 1st ACM International Health Informatics Symposium
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent advances in healthcare and bioscience technologies and proliferation of portable medical devices have produce massive amount of multimodal data, the need for parallel processing is apparent for mining these data sets, which can range anywhere from tens of gigabytes, to terabytes or even petabytes. AALIM (Advanced Analytics for Information Management) is a new multimodal mining-based clinical decision support system that brings together patient data captured in many modalities to provide a holistic presentation of a patient's exam data, diseases, and medications. In addition, it offers disease-specific similarity search based on the various data modalities. The current deployed AALIM system is only able to process limited amount of patient data per day. In this paper, we attempt to address this challenge of building a healthcare multimodal mining system on top of the MapReduce framework, specifically its popular open-source implementation, Hadoop. We presented a scalable and generic framework that enables automatic parallelization of the healthcare multimodal mining algorithm, and distribution of large-scale computation that achieves high performance on clusters of commodity servers. Initial testing of importing a single AALIM module (EKG period estimation) using Hadoop on a cluster of servers shows very promising results.