Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures

  • Authors:
  • Andrei Terechko;Jan Hoogerbrugge;Ghiath Alkadi;Surendra Guntur;Anirban Lahiri;Marc Duranton;Clemens Wüst;Phillip Christie;Axel Nackaerts;Aatish Kumar

  • Affiliations:
  • NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands;NXP Semiconductors, The Netherlands

  • Venue:
  • ACM Transactions on Embedded Computing Systems (TECS)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore architectures provide scalable performance with a lower hardware design effort than single core processors. Our article presents a design methodology and an embedded multicore architecture, focusing on reducing the software design complexity and boosting the performance density. First, we analyze characteristics of the Task-Level Parallelism in modern multimedia workloads. These characteristics are used to formulate requirements for the programming model. Then we translate the programming model requirements to an architecture specification, including a novel low-complexity implementation of cache coherence and a hardware synchronization unit. Our evaluation demonstrates that the novel coherence mechanism substantially simplifies hardware design, while reducing the performance by less than 18% relative to a complex snooping technique. Compared to a single processor core, the multicores have already proven to be more area- and energy-efficient. However, the multicore architectures in embedded systems still compete with highly efficient function-specific hardware accelerators. In this article we identify five architectural methods to boost performance density of multicores; microarchitectural downscaling, asymmetric multicore architectures, multithreading, generic accelerators, and conjoining. Then, we present a novel methodology to explore multicore design spaces, including the architectural methods improving the performance density. The methodology is based on a complex formula computing performances of heterogeneous multicore systems. Using this design space exploration methodology for HD and QuadHD H.264 video decoding, we estimate that the required areas of multicores in CMOS 45 nm are 2.5 mm2 and 8.6 mm2, respectively. These results suggest that heterogeneous multicores are cost-effective for embedded applications and can provide a good programmability support.