MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An Introduction to the Gilgamesh PIM Architecture
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Evaluation of a Multithreaded Architecture for Cellular Computing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Earth: an efficient architecture for running threads
Earth: an efficient architecture for running threads
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Performance and Scalability Analysis of the BlueGene/L Architecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
An Efficient OpenMP Runtime System for Hierarchical Architectures
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Scheduling dynamic OpenMP applications over multicore architectures
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Hi-index | 0.00 |
This paper addresses the underlying sources of performance degradation (e.g. latency, overhead, and starvation) and the difficulties of programmer productivity (e.g. explicit locality management and scheduling, performance tuning, fragmented memory, and synchronous global barriers) to dramatically enhance the broad effectiveness of parallel processing for high end computing. We are developing a hierarchical threaded virtual machine (HTVM) that defines a dynamic, multithreaded execution model and programming model, providing an architecture abstraction for HEC system software and tools development. We are working on a prototype language, LITL-X (pronounced "little-X") for Latency Intrinsic-Tolerant Language, which provides the application programmers with a powerful set of semantic constructs to organize parallel computations in a way that hides/manages latency and limits the effects of overhead. This is quite different from locality management, although the intent of both strategies is to minimize the effect of latency on the efficiency of computation. We will work on a dynamic compilation and runtime model to achieve efficient LITL-X program execution. Several adaptive optimizations will be studied. A methodology of incorporating domainspecific knowledge in program optimization will be studied. Finally, we plan to implement our method in an experimental testbed for a HEC architecture and perform a qualitative and quantitative evaluation on selected applications.