Integrating MapReduce and RDBMSs

  • Authors:
  • Natalie Gruska;Patrick Martin

  • Affiliations:
  • Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada

  • Venue:
  • Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data processing needs are changing with the ever increasing amounts of both structured and unstructured data. While the processing of structured data typically relies on the well-developed field of relational database management systems (RDBMSs), MapReduce is a programming model developed to cope with processing immense amounts of unstructured data. MapReduce, however, offers features and advantages that can be exploited to process structured data. Several database vendors and researchers have already turned to MapReduce to aid in processing relational data, thus requiring integration of MapReduce and RDBMS technologies. In this paper, we provide a taxonomy to characterize several existing integration methods. Further, we take a detailed look at DBInputFormat which is an interface between Hadoop's MapReduce and a relational database. The challenges posed by such an interface are identified and we provide suggestions for improvement.