Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Address Tracing for Parallel Machines
Computer - Special issue on experimental research in computer architecture
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An analysis of degenerate sharing and false coherence
Journal of Parallel and Distributed Computing
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Pentium Pro and Pentium II system architecture (2nd ed.)
Pentium Pro and Pentium II system architecture (2nd ed.)
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Parallel and Distributed Systems
New TPC benchmarks for decision support and web commerce
ACM SIGMOD Record
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The sun fireplane system interconnect
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Trace Factory: Generating Workloads for Trace-Driven Simulation of Shared-Bus Multiprocessors
IEEE Parallel & Distributed Technology: Systems & Technology
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Comparing the Memory System Performance of DSS Workloads on the HP V-Class and SGI Origin 2000
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Detailed Characterization of a Quad Pentium Pro Server Running TPC-D
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Model-based cache-aware dispatching of object-oriented software for multicore systems
Journal of Systems and Software
Hi-index | 0.00 |
In this work, we characterized the memory performance-and in particular the impact of coherence overhead and process migration-of a shared-bus shared-memory multiprocessor running a DSS workload. When the number of processors is increased in order to achieve higher computational power, the bus becomes a major bottleneck of such architecture. We evaluated solutions that can greatly reduce that bottleneck. An area where this kind of optimization is important regards data base systems. For this reason, we considered a DSS workload and we setup the experiments following TPC-D specifications on the PostgreSQL DBMS in order to explore different optimizations on same kind of workloads as evaluated in the literature. In this scenario, we compare possible solutions to boost performance and we show the impact of process migration on coherence overhead. We found that the consequences of coherence overhead and process migration on performance are very important in machines with 16 or more processors. In this case, even little sharing, as in DSS applications, can become crucial for system performance. Another important result of our analysis regards the interaction between the coherence protocol and the scheduler. The basic cache affinity scheduling is useful in reducing migration, but it is not effective in every load condition. Specific coherence protocols can help reduce the effects of process migration, especially in situations when the scheduler cannot apply the affinity requirement. In these conditions, the use of a write-update protocol with a selective invalidation strategy for private data improves performance (and scalability) of about 20% with respect to a classical MESI-based solution. This advantage is about 50% in the case of high cache-to-cache transfer.