Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Vector quantization and signal compression
Vector quantization and signal compression
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
MILLIPEDE: easy parallel programming in available distributed environments
Software—Practice & Experience
Per-Node Multithreading and Remote Latency
IEEE Transactions on Computers
Thread migration and its applications in distributed shared memory systems
Journal of Systems and Software
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Using Remote Access Histories for Thread Scheduling in Distributed Shared Memory Systems
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Distributed-Thread Scheduling Methods for Reducing Page-Thrashing
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Prediction and Adaptation in Active Harmony
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Hi-index | 0.00 |
When threads are migrated from heavily loaded nodes to lightly loaded nodes for load balance in software distributed shared memory systems, the communication cost of maintaining data consistency is increased if migration threads are carelessly selected. Program performance is degraded when loss from increased communication exceeds the benefit from load balancing. This study addresses the problem with a novel selection policy called reduction of internode sharing costs. The main characteristic of this policy is simultaneously considering thread memory access types and global sharing. The experimental results show that this policy can reduce the communication of benchmark applications by 50% during load balancing.