Main Memory-Based Algorithms for Efficient Parallel Aggregation for Temporal Databases

Authors:
Dengfeng Gao;Jose Alvin G. Gendrano;Bongki Moon;Richard Thomas Snodgrass;Minseok Park;Bruce C. Huang;Jim M. Rodrigue
Affiliations:
Computer Science Department, University of Arizona, Tuscon, AZ 85721-0077, USA;Computer Science Department, University of Arizona, Tuscon, AZ 85721-0077, USA;Computer Science Department, University of Arizona, Tuscon, AZ 85721-0077, USA;Computer Science Department, University of Arizona, Tuscon, AZ 85721-0077, USA. rts@cs.arizona.edu;Computer Science Department, University of Arizona, Tuscon, AZ 85721-0077, USA;Computer Science Department, University of Arizona, Tuscon, AZ 85721-0077, USA;Computer Science Department, University of Arizona, Tuscon, AZ 85721-0077, USA
Venue:
Distributed and Parallel Databases
Year:
2004

Citing 15
Cited 4

Temporal Databases

Computer
Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
A consensus glossary of temporal database concepts

ACM SIGMOD Record
Parallel algorithms for the execution of relational database operations

ACM Transactions on Database Systems (TODS)
Efficient computation of temporal aggregates with range predicates

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Aggregates in the Temporal Query Language TQuel

IEEE Transactions on Knowledge and Data Engineering
Incremental Computation and Maintenance of Temporal Aggregates

Proceedings of the 17th International Conference on Data Engineering
Translating Aggregate Queries into Iterative Programs

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Computing Temporal Aggregates

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Scalable Algorithms for Large Temporal Aggregation

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Aggregation in temporal databases

Aggregation in temporal databases
Incremental computation and maintenance of temporal aggregates

The VLDB Journal — The International Journal on Very Large Data Bases

On computing temporal aggregates with range predicates

ACM Transactions on Database Systems (TODS)
Processing count queries over event streams at multiple time granularities

Information Sciences: an International Journal
Aggregating and disaggregating flexibility objects

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

International Journal of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to model the temporal dimension is essential to many applications. Furthermore, the rate of increase in database size and stringency of response time requirements has out-paced advancements in processor and mass storage technology, leading to the need for parallel temporal database management systems. In this paper, we introduce a variety of parallel temporal aggregation algorithms for the shared-nothing architecture; these algorithms are based on the sequential Aggregation Tree algorithm. We are particularly interested in developing parallel algorithms that can maximally exploit available memory to quickly compute large-scale temporal aggregates without intermediate disk writes and reads. Via an empirical study, we found that the number of processing nodes, the partitioning of the data, the placement of results, and the degree of data reduction effected by the aggregation impacted the performance of the algorithms. For distributed result placement, we discovered that Greedy Time Division Merge was the obvious choice. For centralized results and high data reduction, Pairwise Merge was preferred for a large number of processing nodes; for low data reduction, it only performed well up to 32 nodes. This led us to a centralized variant of Greedy Time Division Merge which was best for the remaining cases. We present a cost model that closely predicts the running time of Greedy Time Division Merge.