FELI: HW/SW support for on-chip distributed shared memory in multicores

  • Authors:
  • Carlos Villavieja;Yoav Etsion;Alex Ramirez;Nacho Navarro

  • Affiliations:
  • Universitat Politecnica de Catalunya and Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;Universitat Politecnica de Catalunya and Barcelona Supercomputing Center, Barcelona, Spain;Universitat Politecnica de Catalunya and Barcelona Supercomputing Center, Barcelona, Spain

  • Venue:
  • Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern Chip Multiprocessors (CMPs) composed of accelerators and on-chip scratchpad memories are currently emerging as powerefficient architectures. However, these architectures are hard to program because they require efficient data allocation. In addition, when running legacy applications on these architectures, unless their code is adapted to utilize the distributed memory architecture, applications cannot benefit from their high computational power. In this paper, we propose FELI, a set of operating system mechanisms that allocate application data to on-chip memories without any user intervention. FELI, automatically maps data to on-chip memories using the address translation mechanism. It relies on a set of TLB counters, and dynamical migration of pages from off-chip memory to on-chip memory. We also introduce virtually tagged L0 caches to alleviate the address translation overhead. Moreover, we make a comparison in performance and power consumption versus a homogeneous cache-based CMP design. Our evaluation shows a 50% average improvement in power consumption with the scratchpad-based CMP compared to a cache-based CMP. And a 10% in average memory access time even accounting for the cost of page migrations and TLB invalidations. FELI can automatically allocate on-chip memory to an average of 90% of the applications working set.