VDB-MR: MapReduce-based distributed data integration using virtual database

  • Authors:
  • Yulai Yuan;Yongwei Wu;Xiao Feng;Jing Li;Guangwen Yang;Weimin Zheng

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, 100084, China

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data Integration is becoming very important in many commercial applications and scientific research. A lot of algorithms and systems have been proposed and developed to address related issues from different aspects. Virtual database systems are well-recognized as one of the effective solutions of data integration. The existing execution modules in virtual database systems are very ineffective. MapReduce (MR) is a new computing model for parallel processing and has a good performance on large-scale data execution. In this paper, we propose a new distributed data integration system, called VDB-MR, which is based on the MapReduce technology, to efficiently integrate data from heterogeneous data sources. With VDB-MR, a unified view (i.e., a single virtual database) of multiple databases can be provided to users. We also conducted a series of experiments to evaluate VDB-MR by comparing it with an open source data integration system OGSA-DAI and two DBMSs in parallel. Experiment results show that VDB-MR significantly outperforms OGSA-DAI and the DBMSs in parallel.