Performance analysis of methods that overcome false sharing effects in software DSMs

Authors:
Manjunath Kudlur;R. Govindarajan
Affiliations:
Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560 012, India and Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arb ...;Department of Computer Science and Automation, Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
Venue:
Journal of Parallel and Distributed Computing
Year:
2004

Citing 23
Cited 0

Memory coherence in shared virtual memory systems

PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
A comparison of sorting algorithms for the connection machine CM-2

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for reducing consistency-related communication in distributed shared-memory systems

ACM Transactions on Computer Systems (TOCS)
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Lazy release consistency for distributed shared memory

Lazy release consistency for distributed shared memory
Tradeoffs between false sharing and aggregation in software distributed shared memory

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Tapeworm: high-level abstractions of shared accesses

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
MultiView and Millipage — fine-grain sharing in page-based DSMs

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Responsiveness without interrupts

ICS '99 Proceedings of the 13th international conference on Supercomputing
Shared Memory Consistency Models: A Tutorial

Computer
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
Transparent Adaptation of Sharing Granularity in MultiView-Based DSM Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Toward a Compile-Time Methodology for Reducing False Sharing and Communication Traffic in Shared Virtual Memory Systems

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Making Distributed Shared Memory Simple, Yet Efficient

HIPS '98 Proceedings of the High-Level Parallel Programming Models and Supportive Environments
Dynamically Controlling False Sharing in Distributed Shared Memory

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
The relative importance of concurrent writers and weak consistency models

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Dynamic Adaptation of Sharing Granularity in DSM Systems

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Page-based software DSMs experience high degrees of false sharing especially in irregular applications with fine grain sharing granularity. The overheads due to false sharing is considered to be a dominant factor limiting the performance of software DSMs. Several approaches have been proposed in the literature to reduce/eliminate false sharing. In this paper, we evaluate two of these approaches, viz., the Multiple Writer approach and the emulated fine grain sharing (EmFiGS) approach. Our evaluation strategy is two pronged. First, we use an implementation-independent analysis that uses overhead counts to compare the different approaches. Our analysis show that the benefits gained by eliminating false sharing are far outweighed by the performance penalty incurred due to the reduced exploitation of spatial locality in the EmFiGS approach. As a consequence, any implementation of the EmFiGS approach is likely to perform significantly worse than the Multiple Writer approach. Second, we use experimental evaluation to validate and complement our analysis. The experimental results match well with our analysis. Also the execution times of the application follow the same trend as in our analysis, reinforcing our conclusions. More specifically, the performance of the EmFiGS approach is significantly worse, by a factor of 1.5 to as much as 90 times, compared to the Multiple Writer approach. In many cases, the EmFiGS approach performs worse than even a single writer lazy release protocol which experiences very high overheads due to false sharing. The performance of the EmFiGS approach remains worse than the Multiple Writer approach even after incorporating Tapeworm--a record and replay technique that fetches pages ahead of demand in an aggregated fashion--to alleviate the spatial locality effect. We next present the effect of asynchronous message handling on the performance of different methods. Finally, we investigate the inter-play between spatial locality exploitation and false sharing elimination with varying sharing granularities in the EmFiGS approach and report the tradeoffs.