Logic verification algorithms and their parallel implementation
DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
Analysis of interprocedural side effects in a parallel programming environment
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Interprocedural side-effect analysis in linear time
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Memory-reference characteristics of multiprocessor applications under MACH
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
Techniques for efficient inline tracing on a shared-memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Simplicity Versus Accuracy in a Model of Cache Coherency Overhead
IEEE Transactions on Computers
The Stanford Dash Multiprocessor
Computer
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A parallel adaptive fast multipole method
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Using compile-time analysis and transformations to reduce false sharing on shared-memory multiprocessors
A practical interprocedural data flow analysis algorithm
Communications of the ACM
A precise inter-procedural data flow algorithm
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient way to find the side effects of procedure calls and the aliases of variables
POPL '79 Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Computing Per-Process Summary Side-Effect Information
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Static Analysis of Barrier Synchronization in Explicitly Parallel Programs
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
Non-singular data transformations: definition, validity and applications
ICS '97 Proceedings of the 11th international conference on Supercomputing
Tradeoffs between false sharing and aggregation in software distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing
IEEE Transactions on Computers
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
IEEE Transactions on Parallel and Distributed Systems
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP
IEEE Transactions on Parallel and Distributed Systems
Hoard: a scalable memory allocator for multithreaded applications
ACM SIGPLAN Notices
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Static and Dynamic Locality Optimizations Using Integer Linear Programming
IEEE Transactions on Parallel and Distributed Systems
Reducing coherence overhead of barrier synchronization in software DSMs
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Using high performance GIS software to visualize data: a hands-on software demonstration
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A proposal for preservice student technology competence
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Integrating loop and data transformations for global optimization
Journal of Parallel and Distributed Computing
OpenMP on networks of workstations for software DSMs
Journal of Computer Science and Technology
A Layout-Conscious Iteration Space Transformation Technique
IEEE Transactions on Computers
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
Improving server software support for simultaneous multithreaded processors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamically Controlling False Sharing in Distributed Shared Memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Performance analysis of methods that overcome false sharing effects in software DSMs
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Lightweight reference affinity analysis
Proceedings of the 19th annual international conference on Supercomputing
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Structure Layout Optimization for Multithreaded Programs
Proceedings of the International Symposium on Code Generation and Optimization
Incrementally parallelizing database transactions with thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Static Detection of Place Locality and Elimination of Runtime Checks
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Assessing cache false sharing effects by dynamic binary instrumentation
Proceedings of the Workshop on Binary Instrumentation and Applications
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Tackling cache-line stealing effects using run-time adaptation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Transactional conflict decoupling and value prediction
Proceedings of the international conference on Supercomputing
SHERIFF: precise detection and automatic mitigation of false sharing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
A static analysis tool using a three-step approach for data races in HPC programs
Proceedings of the 2012 Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.01 |
We have developed compiler algorithms that analyze explicitly parallel programs and restructure their shared data to reduce the number of false sharing misses. The algorithms analyze per-process shared data accesses, pinpoint the data structures that are susceptible to false sharing and choose an appropriate transformation to reduce it. The transformations either group data that is accessed by the same processor or separate individual data items that are shared.This paper evaluates that technique. We show through simulation that our analysis successfully identifies the data structures that are responsible for most false sharing misses, and then transforms them without unduly decreasing spatial locality. The reduction in false sharing positively impacts both execution time and program scalability when executed on a KSR2. Both factors combine to increase the maximum achievable speedup for all programs, more than doubling it for several. Despite being able to only approximate actual inter-processor memory accesses, the compiler-directed transformations always outperform programmer efforts to eliminate false sharing.