Computational Complexity of Sorting and Joining Relations with Duplicates

Authors:
M. Abdelguerfi;A. K. Sood
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
1991

Citing 11
Cited 2

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Approximating the number of unique values of an attribute without sorting

Information Systems
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Duplicate record elimination in large data files

ACM Transactions on Database Systems (TODS)
System R: relational approach to database management

ACM Transactions on Database Systems (TODS)
A relational model of data for large shared data banks

Communications of the ACM
Systolic (VLSI) arrays for relational database operations

SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
A Relational Algebraic Approach to Protocol Verification

IEEE Transactions on Software Engineering
Protocol Verification Using Relational Database Systems

Proceedings of the Third International Conference on Data Engineering
Main Memory Database Research Directions

IWDM '89 Proceedings of the Sixth International Workshop on Database Machines
Special Function Unit for Statistical Aggregation Functions

IWDM '89 Proceedings of the Sixth International Workshop on Database Machines

Experimentation with Hypercube Database Engines

IEEE Micro
Data reduction through early grouping

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is shown that the existence of duplicate values in some attribute columns has a significant impact on the computational complexity of the sorting and joining operations. This is especially true when the number of distinct tuple values is a small fraction of the total number of tuples. The authors characterize a multirelation M(n, L) by its cardinality n and the number of distinct elements L it contains. Under this characterization, the worst time complexity of sorting such a multirelation with binary comparisons as basic operations is investigated. Upper and lower bounds on the number of three-branch comparisons needed to sort such a multirelation are established. Thereafter, the methodology used to study the complexity of sorting is applied to the natural join operation. It is shown that the existence of duplicate values in the join attribute columns can be exploited to reduce the computational complexity of the natural join operation.