Join and multi-join processing in data integration systems

Authors:
Kain-Lee Tan;Pin Kwang Eng;Beng Chin Ooi;Ming Zhang
Affiliations:
Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2, Singapore;Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2, Singapore;Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2, Singapore;Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2, Singapore
Venue:
Data & Knowledge Engineering
Year:
2002

Citing 11
Cited 0

SIMSCRIPT II.5 and SIMGRAPHICS tutorial

WSC '93 Proceedings of the 25th conference on Winter simulation
Cost-based query scrambling for initial delays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Dynamic Query Operator Scheduling for Wide-Area Remote Access

Distributed and Parallel Databases
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Scrambling query plans to cope with unexpected delays

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Data Structures in Java

Data Structures in Java
Java Software Solutions: Foundations of Program Design with Cdrom

Java Software Solutions: Foundations of Program Design with Cdrom
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Hash-Partitioned Join Method Using Dynamic Destaging Strategy

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Obtaining Complete Answers from Incomplete Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query processing in a data integration system is complicated by a lack of quality statistics about the data, unpredictable and bursty data transfer rates, and slow or unavailable data sources. Conventional query processing algorithms, which are based on a blocking execution model, are no longer attractive because of their long initial response time. Moreover, the execution engine may be stalled by slow data delivery rates or unavailable data sources. In this paper, we adopt a non-blocking execution model for evaluating queries. We propose a symmetric partition-based join algorithm, called AJoin, that can operate with small memory requirement, produce first few answer tuples quickly, and blocks only when all available data have been examined. We also examine heuristics to manage the partitions and address the memory management issues of AJoin. To evaluate multi-join query plans, we also proposed two new strategies, m-AJoin and Pm-AJoin. Both strategies evaluate each join operation using AJoin. While m-AJoin accesses data from remote sources in its entirety, Pm-AJoin accesses remote data in chunks of smaller partitions. Our performance study shows the effectiveness of the proposed approaches for join and multi-join processing in a multi-user data integration system.