Data marshaling for multi-core architectures

  • Authors:
  • M. Aater Suleman;Onur Mutlu;José A. Joao; Khubaib;Yale N. Patt

  • Affiliations:
  • The University of Texas, Austin, Austin, TX, USA;Carnegie Mellon University, Pittsburgh, PA, USA;The University of Texas, Austin, Austin, TX, USA;The University of Texas, Austin, Austin, TX, USA;The University of Texas, Austin, Austin, TX, USA

  • Venue:
  • Proceedings of the 37th annual international symposium on Computer architecture
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to best run that segment, can improve performance and save power. However, SE's benefit is limited because most segments access inter-segment data, i.e., data generated by the previous segment. When consecutive segments run on different cores, accesses to inter-segment data incur cache misses, thereby reducing performance. This paper proposes Data Marshaling (DM), a new technique to eliminate cache misses to inter-segment data. DM uses profiling to identify instructions that generate inter-segment data, and adds only 96 bytes/core of storage overhead. We show that DM significantly improves the performance of two promising Staged Execution models, Accelerated Critical Sections and producer-consumer pipeline parallelism, on both homogeneous and heterogeneous multi-core systems. In both models, DM can achieve almost all of the potential of ideally eliminating cache misses to inter-segment data. DM's performance benefit increases with the number of cores.