Principles for designing data-/compute-intensive distributed applications and middleware systems for heterogeneous environments

Authors:
Jik-Soo Kim;Henrique Andrade;Alan Sussman
Affiliations:
Department of Computer Science, University of Maryland, College Park, MD 20742, USA;IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532, USA;Department of Computer Science, University of Maryland, College Park, MD 20742, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2007

Citing 26
Cited 4

Evaluating compiler optimizations for Fortran D

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Runtime and language support for compiling adaptive irregular programs on distributed-memory machines

Software—Practice & Experience
VisDB: a system for visualizing large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Cluster-based scalable network services

Proceedings of the sixteenth ACM symposium on Operating systems principles
The virtual microscope

The virtual microscope
Efficient run-time support for irregular block-structured applications

Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
Parallel I/O for scientific applications on heterogeneous clusters: a resource-utilization approach

ICS '99 Proceedings of the 13th international conference on Supercomputing
Distributed processing of very large datasets with DataCutter

Parallel Computing - Clusters and computational grids for scientific computing
MPI: The Complete Reference

MPI: The Complete Reference
Processing large-scale multi-dimensional data in parallel and distributed environments

Parallel Computing - Parallel data-intensive algorithms and applications
Passion: Optimized I/O for Parallel Applications

Computer
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications

IEEE Transactions on Parallel and Distributed Systems
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Scheduling Multiple Data Visualization Query Workloads on a Shared Memory Machine

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimization of Data-Parallel Field Expressions in the POOMA Framework

ISCOPE '97 Proceedings of the Scientific Computing in Object-Oriented Parallel Environments
Overture: Object-Oriented Tools for Applications with Complex Geometry

ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
A high performance multi-perspective vision studio

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Evaluation of a Resource Selection Mechanism for Complex Network Services

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Sourcebook of parallel computing

Sourcebook of parallel computing
Optimizing the Execution of Multiple Data Analysis Queries on Parallel and Distributed Environments

IEEE Transactions on Parallel and Distributed Systems
Comparing the Performance of High-Level Middleware Systems in Shared and Distributed Memory Parallel Environments

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Multiple range query optimization with distributed cache indexing

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Grid Computing: The New Frontier of High Performance Computing

Grid Computing: The New Frontier of High Performance Computing
Clustering support and replication management for scalable network services

IEEE Transactions on Parallel and Distributed Systems

Multiple query scheduling for distributed semantic caches

Journal of Parallel and Distributed Computing
Design principles for developing stream processing applications

Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware

Software—Practice & Experience
Fractal self-similarity measurements based clustering technique for SOAP Web messages

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The nature of distributed systems is constantly and steadily changing as the hardware and software landscape evolves. Porting applications and adapting existing middleware systems to ever changing computational platforms has become increasingly complex and expensive. Therefore, the design of applications, as well as the design of next generation middleware systems, must follow a set of guiding principles in order to insure long-term ''survivability'' without costly re-engineering. From our practical experience, the key determinants to success in this endeavor are adherence to the following principles: (1) Design for change; (2) Provide for storage subsystem I/O coordination; (3) Employ workload partitioning and load balancing techniques; (4) Employ caching; (5) Schedule the workload; and (6) Understand the workload. In order to support these principles, we have collected extensive experimental results comparing three middleware systems targeted at data- and compute-intensive applications implemented by our research group during the course of the last decade, on a single data- and compute-intensive application. The main contribution of this work is the analysis of a level playing field, where we discuss and quantify how adherence to these guiding principles impacts overall system throughput and response time.