The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Region-based memory management in cyclone
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Ownership, encapsulation and the disjointness of type and effect
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Ownership types for object encapsulation
POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Separation Logic: A Logic for Shared Mutable Data Structures
LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Composable memory transactions
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
A type and effect system for deterministic parallel Java
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Safe nondeterminism in a deterministic-by-default parallel language
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community
Computing in Science and Engineering
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Hierarchical place trees: a portable abstraction for task parallelism and data movement
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Legion: expressing locality and independence with logical regions
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Applications written for distributed-memory parallel architectures must partition their data to enable parallel execution. As memory hierarchies become deeper, it is increasingly necessary that the data partitioning also be hierarchical to match. Current language proposals perform this hierarchical partitioning statically, which excludes many important applications where the appropriate partitioning is itself data dependent and so must be computed dynamically. We describe Legion, a region-based programming system, where each region may be partitioned into subregions. Partitions are computed dynamically and are fully programmable. The division of data need not be disjoint and subregions of a region may overlap, or alias one another. Computations use regions with certain privileges (e.g., expressing that a computation uses a region read-only) and data coherence (e.g., expressing that the computation need only be atomic with respect to other operations on the region), which can be controlled on a per-region (or subregion) basis. We present the novel aspects of the Legion design, in particular the combination of static and dynamic checks used to enforce soundness. We give an extended example illustrating how Legion can express computations with dynamically determined relationships between computations and data partitions. We prove the soundness of Legion's type system, and show Legion type checking improves performance by up to 71% by eliding provably safe memory checks. In particular, we show that the dynamic checks to detect aliasing at runtime at the region granularity have negligible overhead. We report results for three real-world applications running on distributed memory machines, achieving up to 62.5X speedup on 96 GPUs on the Keeneland supercomputer.