A measure of transaction processing power
Datamation
Memory management during run generation in external sorting
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Internal and tape sorting using the replacement-selection technique
Communications of the ACM
Speeding Up External Mergesort
IEEE Transactions on Knowledge and Data Engineering
External Sorting: Run Formation Revisited
IEEE Transactions on Knowledge and Data Engineering
Implementing sorting in database systems
ACM Computing Surveys (CSUR)
Compression techniques for fast external sorting
The VLDB Journal — The International Journal on Very Large Data Bases
Sorting hierarchical data in external memory for archiving
Proceedings of the VLDB Endowment
Micro-specialization: dynamic code specialization of database management systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
The performance of external sorting using merge sort is highly dependent on the length of the runs generated. One of the most commonly used run generation strategies is Replacement Selection (RS) because, on average, it generates runs that are twice the size of the memory available. However, the length of the runs generated by RS is downsized for data with certain characteristics, like inputs sorted inversely with respect to the desired output order. The goal of this paper is to propose and analyze two-way replacement selection (2WRS), which is a generalization of RS obtained by implementing two heaps instead of the single heap implemented by RS. The appropriate management of these two heaps allows generating runs larger than the memory available in a stable way, i.e. independent from the characteristics of the datasets. Depending on the changing characteristics of the input dataset, 2WRS assigns a new data record to one or the other heap, and grows or shrinks each heap, accommodating to the growing or decreasing tendency of the dataset. On average, 2WRS creates runs of at least the length generated by RS, and longer for datasets that combine increasing and decreasing data subsets. We tested both algorithms on large datasets with different characteristics and 2WRS achieves speedups at least similar to RS, and over 2.5 when RS fails to generate large runs.