Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The cache coherence problem in shared-memory multiprocessors
The cache coherence problem in shared-memory multiprocessors
Firefly: A Multiprocessor Workstation
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Techniques for efficient inline tracing on a shared-memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Address Tracing for Parallel Machines
Computer - Special issue on experimental research in computer architecture
Shared Block Contention in a Cache Coherence Protocol
IEEE Transactions on Computers
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Delayed consistency and its effects on the miss rate of parallel programs
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Simplicity Versus Accuracy in a Model of Cache Coherency Overhead
IEEE Transactions on Computers
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The accuracy of trace-driven simulations of multiprocessors
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
Journal of Parallel and Distributed Computing
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Evaluation of a competitive-update cache coherence protocol with migratory data detection
Journal of Parallel and Distributed Computing
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Analytical Prediction of Performance for Cache Coherence Protocols
IEEE Transactions on Computers
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementing a cache consistency protocol
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Computer Architecture: Pipelined and Parallel Processor Design
Computer Architecture: Pipelined and Parallel Processor Design
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
Trace Factory: Generating Workloads for Trace-Driven Simulation of Shared-Bus Multiprocessors
IEEE Parallel & Distributed Technology: Systems & Technology
Accuracy of Memory Reference Traces of Parallel Computations in Trace-Drive Simulation
IEEE Transactions on Parallel and Distributed Systems
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Two Adaptive Hybrid Cache Coherency Protocols
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Improving the Data Cache Performance of Multiprocessor Operating Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Dynamic Cache Sub-block Design to Reduce False Sharing
A Dynamic Cache Sub-block Design to Reduce False Sharing
Simulation Analysis Data Sharing in Shared Memory Multiprocessors
Simulation Analysis Data Sharing in Shared Memory Multiprocessors
Absolute and Comparative Performance of Cache Consistency Algorithms
Absolute and Comparative Performance of Cache Consistency Algorithms
The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors
The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors
Boosting the Performance of Three-Tier Web Servers Deploying SMP Architecture
Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Fine-grain design space exploration for a cartographic SoC multiprocessor
ACM SIGARCH Computer Architecture News
Lower Bounds on the Loading of Multiple Bus Networks for Binary Tree Algorithms
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
Hi-index | 0.01 |
In high-performance general-purpose workstations and servers, the workload can be typically constituted of both sequential and parallel applications. Shared-bus shared-memory multiprocessor can be used to speed-up the execution of such workload. In this environment, the scheduler takes care of the load balancing by allocating a ready process on the first available processor, thus producing process migration. Process migration and the persistence of private data into different caches produce an undesired sharing, named passive sharing. The copies due to passive sharing produce useless coherence traffic on the bus and coping with such a problem may represent a challenging design problem for these machines. Many protocols use smart solutions to limit the overhead to maintain coherence among shared copies. None of these studies treats passive-sharing directly, although some indirect effect is present while dealing with the other kinds of sharing. Affinity scheduling can alleviate this problem, but this technique does not adapt to all load conditions, especially when the effects of migration are massive. We present a simple coherence protocol that eliminates passive sharing using information from the compiler that is normally available in operating system kernels. We evaluate the performance of this protocol and compare it against other solutions proposed in the literature by means of enhanced trace-driven simulation. We evaluate the complexity in terms of the number of protocol states, additional bus lines, and required software support. Our protocol further limits the coherence-maintaining overhead by using information about access patterns to shared data exhibited in parallel applications.