A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Address Tracing for Parallel Machines
Computer - Special issue on experimental research in computer architecture
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Simplicity Versus Accuracy in a Model of Cache Coherency Overhead
IEEE Transactions on Computers
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An analysis of degenerate sharing and false coherence
Journal of Parallel and Distributed Computing
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Pentium Pro and Pentium II system architecture (2nd ed.)
Pentium Pro and Pentium II system architecture (2nd ed.)
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Parallel and Distributed Systems
Analysis of Cache Performance for Operating Systems and Multiprogramming
Analysis of Cache Performance for Operating Systems and Multiprogramming
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Trace Factory: Generating Workloads for Trace-Driven Simulation of Shared-Bus Multiprocessors
IEEE Parallel & Distributed Technology: Systems & Technology
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
A Trace-Driven Simulator for Performance Evaluation of Cache-Based Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Comparing the Memory System Performance of DSS Workloads on the HP V-Class and SGI Origin 2000
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Analysis of Sharing Overhead in Shared Memory Multiprocessors
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Detailed Characterization of a Quad Pentium Pro Server Running TPC-D
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Structured Computer Organization (5th Edition)
Structured Computer Organization (5th Edition)
Adaptive hybrid partitioning for OLAP query processing in a database cluster
International Journal of High Performance Computing and Networking
Hi-index | 0.00 |
In this work, it is shown how a DBMS workload, running on a shared-bus shared-memory multiprocessor, can be accelerated by adding simple support to the MESI coherence protocol. As a DBMS workload, we choose the TPC-D benchmark running on the PostgreSQL DBMS. Results show that, for a DSS workload, the use of a WU protocol with a selective invalidation strategy for private data improves performance because of the access pattern to shared data and the lower bus utilisation due to the absence of invalidation miss, when the contribution of passive sharing is eliminated. In the 16 processor case, the advantage can be quantified in a 20% of increased performance. Finally, it is shown how results can be extended to other DBMS workloads.