Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 20th annual international conference on Supercomputing
IEEE Transactions on Computers
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
Cache Sharing Management for Performance Fairness in Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
A communication characterisation of Splash-2 and Parsec
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Analyzing Parallel Programs with Pin
Computer
Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Communications of the ACM
Technologies for exascale systems
IBM Journal of Research and Development
Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications
IEEE Transactions on Parallel and Distributed Systems
The maximum weight perfect matching problem for complete weighted graphs is in PC
SPDP '90 Proceedings of the 1990 IEEE Second Symposium on Parallel and Distributed Processing
Why on-chip cache coherence is here to stay
Communications of the ACM
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
PAPI-V: Performance Monitoring for Virtual Machines
ICPPW '12 Proceedings of the 2012 41st International Conference on Parallel Processing Workshops
Communication-Based Mapping Using Shared Pages
IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Hi-index | 0.00 |
In current computer architectures, the communication performance between threads varies depending on the memory hierarchy. This performance difference must be considered when mapping parallel applications to processor cores. In parallel applications based on the shared memory paradigm, the communication is difficult to detect because it is implicit. Furthermore, dynamic mapping introduces several challenges, since it needs to find a suitable mapping and migrate the threads with a low overhead during the execution of the application. We propose a mechanism to detect the communication pattern of shared memory applications by monitoring cache coherence protocols. We also propose heuristics that, combined with our communication detection mechanism, allow the mapping to be performed dynamically by the operating system. Experiments with the NAS Parallel Benchmarks showed a reduction of up to 13.9% of the execution time, 30.5% of the cache misses and 39.4% of the number of invalidation messages.