Interoperability of multiple autonomous databases
ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
DIRECT: a query facility for multiple databases
ACM Transactions on Information Systems (TOIS)
ACM Computing Surveys (CSUR)
ACM SIGMOD Record
Heterogeneous database integration in biomedicine
Computers and Biomedical Research
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Autoplex: Automated Discovery of Content for Virtual Databases
CooplS '01 Proceedings of the 9th International Conference on Cooperative Information Systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Multimedia Object Placement for Transparent Data Replication
IEEE Transactions on Parallel and Distributed Systems
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Google's MapReduce programming model – Revisited
Science of Computer Programming
VDM: Virtual Database Management for Distributed Databases and File Systems
GCC '08 Proceedings of the 2008 Seventh International Conference on Grid and Cooperative Computing
A service-oriented system for distributed data querying and integration on Grids
Future Generation Computer Systems
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
A parallel method for computing rough set approximations
Information Sciences: an International Journal
Future Generation Computer Systems
Hi-index | 0.00 |
Data Integration is becoming very important in many commercial applications and scientific research. A lot of algorithms and systems have been proposed and developed to address related issues from different aspects. Virtual database systems are well-recognized as one of the effective solutions of data integration. The existing execution modules in virtual database systems are very ineffective. MapReduce (MR) is a new computing model for parallel processing and has a good performance on large-scale data execution. In this paper, we propose a new distributed data integration system, called VDB-MR, which is based on the MapReduce technology, to efficiently integrate data from heterogeneous data sources. With VDB-MR, a unified view (i.e., a single virtual database) of multiple databases can be provided to users. We also conducted a series of experiments to evaluate VDB-MR by comparing it with an open source data integration system OGSA-DAI and two DBMSs in parallel. Experiment results show that VDB-MR significantly outperforms OGSA-DAI and the DBMSs in parallel.