The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Object and native code thread mobility among heterogeneous computers (includes sources)
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Stardust: an environment for parallel programming on networks of heterogeneous workstations
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Heterogeneous process migration: the Tui system
Software—Practice & Experience
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Heterogeneous Distributed Shared Memory
IEEE Transactions on Parallel and Distributed Systems
Arachne: A Portable Threads System Supporting Migrant Threads on Heterogeneous Network Farms
IEEE Transactions on Parallel and Distributed Systems
The relative importance of concurrent writers and weak consistency models
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Hi-index | 0.00 |
Networks of workstations are fast becoming the standard environment for parallel applications. However, the use of "found" resources as a platform for tightly-coupled runtime environments has at least three obstacles: contention for resources, differing processor speeds, and processor heterogeneity. All three obstacles result in load imbalance, leading to poor performance for scientific applications. This paper describes the use of thread migration in transparently addressing this load imbalance in the context of the CVM software distributed shared memory system. We describe the implementation and performance of mechanisms and policies that accommodate both resource contention, and heterogeneity in clock speed and processor type. Our results show that these cycles can indeed be effectively exploited, and that the runtime cost of processor heterogeneity can be quite manageable. Along the way, however, we identify a number of problems that need to be addressed before such systems can enjoy widespread use.