A layered architecture for querying dynamic Web content
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
ACM Transactions on Software Engineering and Methodology (TOSEM)
Mixin-Based Programming in C++
GCSE '00 Proceedings of the Second International Symposium on Generative and Component-Based Software Engineering-Revised Papers
IEEE Transactions on Software Engineering
Fault Tolerance in a Layered Architecture: A General Specification Pattern in B
SEFM '04 Proceedings of the Software Engineering and Formal Methods, Second International Conference
A Layered Architecture for the Exploration of Heterogeneous Information Using Coordinated Views
VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures
IEEE Computer Architecture Letters
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Feature-based survey of model transformation approaches
IBM Systems Journal - Model-driven software development
TESLA: a transparent, extensible session-layer architecture for end-to-end network services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster
Parallel Computing
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
The PlayStation 3 for High-Performance Scientific Computing
Computing in Science and Engineering
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Performance analysis and visualization tools for cell/B.E. multicore environment
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Applying a Model Transformation Taxonomy to Graph Transformation Technology
Electronic Notes in Theoretical Computer Science (ENTCS)
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
FeatureC++: on the symbiosis of feature-oriented and aspect-oriented programming
GPCE'05 Proceedings of the 4th international conference on Generative Programming and Component Engineering
Hi-index | 0.00 |
Abstract: The emerging accelerator-based heterogeneous clusters, comprising specialized processors such as the IBM Cell and GPUs, have exhibited excellent price to performance ratio as well as high energy-efficiency. However, developing and maintaining software for such systems is fraught with challenges, especially for modern high-performance computing (HPC) applications that can benefit the most from leveraging accelerators. If accelerator-based clusters are to deliver on their initial promise to provide a viable and cost-effective HPC solution to researchers and practitioners, one must find a software solution to lower the barrier to entry for the average user. In this paper, we investigate how a software component based approach can be used to provide a reusable and adaptable architecture for executing HPC tasks on accelerator-based clusters. In our implementation, we leverage the lessons from the software engineering research for component-based layered architectures. Our results indicate that the complexity of developing and maintaining accelerator-based cluster software can be as effectively tamed by solid software engineering approaches as that of software in more traditional domains. Specifically, we were able to reuse 83.6% of our implementation code across different architectures and resource configurations, while achieving the overall execution performance only 1.5% off that of an optimally hand-tuned, albeit non-reusable version.