Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Adaptive parallel aggregation algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Parallel algorithms for the execution of relational database operations
ACM Transactions on Database Systems (TODS)
Querying very large multi-dimensional datasets in ADR
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Distributed processing of very large datasets with DataCutter
Parallel Computing - Clusters and computational grids for scientific computing
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Visualization of Large Data Sets with the Active Data Repository
IEEE Computer Graphics and Applications
Titan: A High-Performance Remote Sensing Database
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Armada: A Parallel File System for Computational Grids
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
dQUOB: Managing Large Data Flows Using Dynamic Embedded Queries
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Optimizing Retrieval and Processing of Multi-Dimensional Scientific Datasets
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
Compiler and middleware support for scalable data mining
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
IEEE Transactions on Information Technology in Biomedicine
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Supporting SQL-3 aggregations on grid-based data repositories
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Optimizing dataflow applications on heterogeneous environments
Cluster Computing
Hi-index | 0.00 |
We investigate runtime strategies for data-intensive applications that invovle generalized reductions on large, distributed datasets.Our set of strategies includes replicated filter state, partitioned filter state, and hybrid options between these two extremes.We evaluate these strategies using emulators of three real applications, different query and output sizes, and a number of configurations.We consider execution in a homogeneous cluster and in a distributed environment where only a subset of nodes hst the data.Our results show replicating the filter state scales well and outperforms other schemes, if sufficient memory is available and sufficient computation is involved to offset the cost of global merge step.In other cases, hybrid is usually the best.Moreover, in almost all cases, the performance of the hybrid strategy is quite close to the best strategy. Thus, we believe that hybrid is an attractive approach when the relative performance of different schemes cannot be predicted.