Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Wrong-path instruction prefetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The memory gap and the future of high performance memories
ACM SIGARCH Computer Architecture News
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
The Effect of Speculative Execution on Cache Performance
Proceedings of the 8th International Symposium on Parallel Processing
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Mispredicted Path Cache Effects
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Study of Control Independence in Superscalar Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using Incorrect Speculation to Prefetch Data in a Concurrent Multithreaded Processor
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
SBAC-PAD '04 Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing
The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture
IEEE Transactions on Parallel and Distributed Systems
Understanding the effects of wrong-path memory references on processor performance
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
POWER4 system microarchitecture
IBM Journal of Research and Development
The impact of wrong-path memory references in cache-coherent multiprocessor systems
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
High-performance multiprocessor systems built around out-of-order processors with aggressive branch predictors execute many memory references that turn out to be on a mispredicted branch path. Previous work that focused on uniprocessors showed that these wrong-path memory references may pollute the caches by bringing in data that are not needed on the correct execution path and by evicting useful data or instructions. Additionally, they may also increase the amount of cache and memory traffic. On the positive side, however, they may have a prefetching effect for memory references on the correct path. While computer architects have thoroughly studied the impact of wrong-path effects in uniprocessor systems, there is no previous work on its effects in multiprocessor systems. In this paper, we explore the effects of wrongpath memory references on the memory system behavior of shared-memory multiprocessor (SMP) systems for both broadcast and directory-based cache coherence. Our results show that these wrong-path memory references can increase the amount of cache-to-cache transfers by 32%, invalidations by 8% and 20% for broadcast and directorybased SMPs, respectively, and the number of writebacks by up to 67% for both systems. In addition to the extra coherence traffic, wrong-path memory references also increase the number of cache line state transitions by 21% and 32% for broadcast and directory-based SMPs, respectively. In order to reduce the performance impact of these wrong-path memory references, we introduce two simple mechanisms - filtering wrong-path blocks that are not likely-to-be-used and wrong-path aware cache replacement - that yield speedups of up to 37%.