Join processing in database systems with large main memories
ACM Transactions on Database Systems (TODS)
The EXODUS optimizer generator
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Grammar-like functional rules for representing query optimization alternatives
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
On the translation of relational queries into iterative programs
ACM Transactions on Database Systems (TODS)
Dynamic query evaluation plans
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Merging sorted runs using large main memory
Acta Informatica
EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
The effect of bucket size tuning in the dynamic hybrid GRACE hash join method
VLDB '89 Proceedings of the 15th international conference on Very large data bases
A performance evaluation of pointer-based joins
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Randomized algorithms for optimizing large join queries
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Efficient assembly for complex objects
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Algorithms for creating indexes for very large tables without quiescing updates
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Working with Persistent Objects: To Swizzle or Not to Swizzle
IEEE Transactions on Software Engineering
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Optimization of dynamic query evaluation plans
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Fast algorithms for universal quantification in large databases
ACM Transactions on Database Systems (TODS)
Fundamental techniques for order optimization
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On saying “Enough already!” in SQL
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Memory management during run generation in external sorting
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization techniques for queries with expensive methods
ACM Transactions on Database Systems (TODS)
Duplicate record elimination in large data files
ACM Transactions on Database Systems (TODS)
ACM Transactions on Database Systems (TODS)
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
A new way to compute the product and join of relations
SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
IEEE Transactions on Knowledge and Data Engineering
PDIS '93 Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
Hash Joins and Hash Teams in Microsoft SQL Server
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Hashing Methods and Relational Algebra Operations
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Buffering and Read-Ahead Strategies for External Mergesort
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An Overview of The System Software of A Parallel Relational Database Machine GRACE
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
An Observation on Database Buffering Performance Metrics
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Generalised Hash Teams for Join and Group-by
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Hash-Partitioned Join Method Using Dynamic Destaging Strategy
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
An Adaptive Hash Join Algorithm for Multiuser Environments
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Memory-Adaptive External Sorting
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Including Group-By in Query Optimization
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sing the truth about ad hoc join costs
The VLDB Journal — The International Journal on Very Large Data Bases
Query processing and optimization in Oracle Rdb
The VLDB Journal — The International Journal on Very Large Data Bases
External Sorting: Run Formation Revisited
IEEE Transactions on Knowledge and Data Engineering
LEO: An autonomic query optimizer for DB2
IBM Systems Journal
Content-based routing: different plans for different data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Implementing sorting in database systems
ACM Computing Surveys (CSUR)
B-tree indexes, interpolation search, and skew
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Query processing in a relational database management system
VLDB '79 Proceedings of the fifth international conference on Very Large Data Bases - Volume 5
Multiprocessor hash-based join algorithms
VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
Streaming queries over streaming data
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Self-selecting, self-tuning, incrementally optimized indexes
Proceedings of the 13th International Conference on Extending Database Technology
A survey of B-tree locking techniques
ACM Transactions on Database Systems (TODS)
Massively parallel sort-merge joins in main memory multi-core database systems
Proceedings of the VLDB Endowment
Hi-index | 0.02 |
Traditional database query processing relies on three types of algorithms for join and for grouping operations. For joins, index nested loops join exploits an index on its inner input, merge join exploits sorted inputs, and hash join exploits differences in the sizes of the join inputs. For grouping, an index-based algorithm has been used in the past whereas today sort- and hash-based algorithms prevail. Cost-based query optimization chooses the most appropriate algorithm for each query and for each operation. Unfortunately, mistaken algorithm choices during compile-time query optimization are common yet expensive to investigate and to resolve.Our goal is to end mistaken choices among join algorithms and among grouping algorithms by replacing the three traditional types of algorithms with a single one. Like merge join, this new join algorithm exploits sorted inputs. Like hash join, it exploits different input sizes for unsorted inputs. In fact, for unsorted inputs, the cost functions for recursive hash join and for hybrid hash join have guided our search for the new join algorithm. In consequence, the new join algorithm can replace both merge join and hash join in a database management system.The in-memory components of the new join algorithm employ indexes. If the database contains indexes for one (or both) of the inputs, the new join can exploit persistent indexes instead of temporary in-memory indexes. Using database indexes to find matching input records, the new join algorithm can also replace index nested loops join.In addition to join operations, a very similar algorithm supports grouping ("group by" queries in SQL) and duplicate elimination. For unsorted inputs, candidate output records take on the role of one of the inputs in a join operation. Our goal is to define a single grouping algorithm that can replace grouping by repeated index searches, by sorting, and by hashing. In other words, our goal is to end mistaken algorithm choices not only for joins and other binary matching operations but also for grouping and other unary matching operations in database query processing.Finally, these new algorithms can be instrumental for efficient and robust data processing in a map-reduce environment, because `map' and `reduce' operations are similar in essentials to join and grouping operations.Results from an implementation of the core algorithm are reported.