Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload

  • Authors:
  • Pierfrancesco Foglia;Roberto Giorgi;Cosimo Antonio Prete

  • Affiliations:
  • Dipartimento di Ingegneria dell'Informazione, Università di Pisa, Via Diotisalvi 2, 56126 Pisa, Italy;Dipartimento di Ingegneria dell'Informazione, Università di Siena, Via Roma 56, 53100 Siena, Italy;Dipartimento di Ingegneria dell'Informazione, Università di Pisa, Via Diotisalvi 2, 56126 Pisa, Italy

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we characterized the memory performance-and in particular the impact of coherence overhead and process migration-of a shared-bus shared-memory multiprocessor running a DSS workload. When the number of processors is increased in order to achieve higher computational power, the bus becomes a major bottleneck of such architecture. We evaluated solutions that can greatly reduce that bottleneck. An area where this kind of optimization is important regards data base systems. For this reason, we considered a DSS workload and we setup the experiments following TPC-D specifications on the PostgreSQL DBMS in order to explore different optimizations on same kind of workloads as evaluated in the literature. In this scenario, we compare possible solutions to boost performance and we show the impact of process migration on coherence overhead. We found that the consequences of coherence overhead and process migration on performance are very important in machines with 16 or more processors. In this case, even little sharing, as in DSS applications, can become crucial for system performance. Another important result of our analysis regards the interaction between the coherence protocol and the scheduler. The basic cache affinity scheduling is useful in reducing migration, but it is not effective in every load condition. Specific coherence protocols can help reduce the effects of process migration, especially in situations when the scheduler cannot apply the affinity requirement. In these conditions, the use of a write-update protocol with a selective invalidation strategy for private data improves performance (and scalability) of about 20% with respect to a classical MESI-based solution. This advantage is about 50% in the case of high cache-to-cache transfer.