Multiprocessor systems programming in a high-level data-flow language
Volume I: Parallel architectures on PARLE: Parallel Architectures and Languages Europe
The limited performance benefits of migrating active processes for load sharing
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
The high performance Fortran handbook
The high performance Fortran handbook
MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An efficient hybrid dataflow architecture model
Journal of Parallel and Distributed Computing
Fortran M: a language for modular parallel programming
Journal of Parallel and Distributed Computing
Processor allocation in multiprogrammed distributed-memory parallel computer systems
Journal of Parallel and Distributed Computing
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Process/Thread Migration and Checkpointing in Heterogeneous Distributed Systems
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 9 - Volume 9
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance evaluation of adaptive MPI
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scaling molecular dynamics to 3000 processors with projections: a performance analysis case study
ICCS'03 Proceedings of the 2003 international conference on Computational science
MSA: multiphase specifically shared arrays
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A Case Study in Tightly Coupled Multi-paradigm Parallel Programming
Languages and Compilers for Parallel Computing
Hiding Communication Latency with Non-SPMD, Graph-Based Execution
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Bamboo: a data-centric, object-oriented approach to many-core software
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
OoOJava: software out-of-order execution
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
DOJ: dynamically parallelizing object-oriented programs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Hi-index | 0.00 |
The parallel programming paradigm based on migratable objects, as embodied in Charm++, improves programmer productivity by automating resource management. The programmer decomposes an application into a large number of parallel objects, while an intelligent run-time system assigns those objects to processors. It migrates objects among processors to effect dynamic load balance and communication optimizations. In addition, having multiple sets of objects representing distinct computations leads to improved modularity and performance. However, for complex applications involving many sets of objects, Charm++'s programming model tends to obscure the global flow of control in a parallel program: One must look at the code of multiple objects to discern how the multiple sets of objects are orchestrated in a given application. In this paper, we present Charisma, an orchestration notation that allows expression of Charm++ functionality without fragmenting the expression of control flow. Charisma separates expression of parallelism, including control flow and macro data-flow, from sequential components of the program. The sequential components only consume and publish data. Charisma expression of multiple patterns of communication among message-driven objects. A compiler generates Charm++ communication and synchronization code via static dependence analysis. As Charisma out puts standard Charm++ code, the functionality and performance benefits of the adaptive run-time system, such as automatic load balancing, are retained. In the paper, we show that Charisma programs scale up to 1024 processors without introducing undue overhead.