Computational geometry: an introduction
Computational geometry: an introduction
Spatial query processing in an object-oriented database system
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Percentile finding algorithm for multiple sorted runs
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Partition based spatial-merge join
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Selectivity and cost estimation for joins based on random sampling
Journal of Computer and System Sciences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
High performance clustering based on the similarity join
Proceedings of the ninth international conference on Information and knowledge management
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On Sort-Merge Algorithm for Band Joins
IEEE Transactions on Knowledge and Data Engineering
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
IEEE Transactions on Knowledge and Data Engineering
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Efficient Evaluation of the Valid-Time Natural Join
Proceedings of the Tenth International Conference on Data Engineering
Dynamic Memory Adjustment for External Mergesort
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Scalable Sweeping-Based Spatial Join
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Parallel Processing Strategy for Evaluating Recursive Queries
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries
Proceedings of the 27th International Conference on Very Large Data Bases
An Algorithm for Computing the Overlay of k-Dimensional Spaces
SSD '91 Proceedings of the Second International Symposium on Advances in Spatial Databases
Data Redundancy and Duplicate Detection in Spatial Join Processing
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
On producing join results early
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Join operations in temporal databases
The VLDB Journal — The International Journal on Very Large Data Bases
RPJ: producing fast join results on streams through rate-based optimization
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A disk-based join with probabilistic guarantees
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
NSJ: an efficient non-blocking spatial join algorithm
GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
ACM Transactions on Database Systems (TODS)
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
The effect of reading policy on early join result production
Information Sciences: an International Journal
Scalable approximate query processing with the DBO engine
ACM Transactions on Database Systems (TODS)
Semantics and implementation of continuous sliding window queries over data streams
ACM Transactions on Database Systems (TODS)
Automating the loading of business process data warehouses
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
RRPJ: result-rate based progressive relational join
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Danaïdes: continuous and progressive complex queries on RSS feeds
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
PR-join: a non-blocking join achieving higher early result rate with statistical guarantees
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Processing exact results for sliding window joins over data streams using disk storage
International Journal of Intelligent Information and Database Systems
Predicate-based indexing for desktop search
The VLDB Journal — The International Journal on Very Large Data Bases
A disk-based, adaptive approach to memory-limited computation of windowed stream joins
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
New algorithms for join and grouping operations
Computer Science - Research and Development
SharedDB: killing one thousand queries with one stone
Proceedings of the VLDB Endowment
Progressive high-dimensional similarity join
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Many state-of-the-art join-techniques require the input relations to be almost fully sorted before the actual join processing starts. Thus, these techniques start producing first results only after a considerable time period has passed. This blocking behaviour is a serious problem when consequent operators have to stop processing, in order to wait for first results of the join. Furthermore, this behaviour is not acceptable if the result of the join is visualized or/ and requires user interaction. These are typical scenarios for data mining applications. The, off-time' of existing techniques even increases with growing problem sizes. In this paper, we propose a generic technique called Progressive Merge Join (PMJ) that eliminates the blocking behaviour of sort-based join algorithms. The basic idea behind PMJ is to have the join produce results, as early as the external mergesort generates initial runs. Hence, it is possible for PMJ to return first results very early. This paper provides the basic algorithms and the generic framework of PMJ, as well as use-cases for different types of joins. Moreover, we provide a generic online selectivity estimator with probabilistic quality guarantees. For similarity joins in particular, first non-blocking join algorithms are derived from applying PMJ to the state-of-the-art techniques. We have implemented PMJ as part of an object-relational cursor algebra. A set of experiments shows that a substantial amount of results are produced, even before the input relationas would have been sorted. We observed only a moderate increase in the total runtime compared to the blocking counterparts.