A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems

  • Authors:
  • Arslan Munir;Ann Gordon-Ross;Sanjay Ranka

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA;Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA;Department of Computer and Information Science and Engineering, University of Florida, Gainesville, USA

  • Venue:
  • ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multi-core to exploit this high transistor density for high performance. However, the optimal layout of these multiple cores along with the memory subsystem (caches and main memory) to satisfy power, area, and often stringent real-time constraints is a challenging design endeavor. The short time-to-market constraint of embedded systems exacerbates this design challenge and necessitates the architectural modeling of embedded systems to reduce the time-to-market by expediting target applications to device/architecture mapping. In this paper, we present a queueing theoretic approach for modeling multi-core embedded systems that provides a quick and inexpensive performance evaluation both in terms of time and resources as compared to the development of multi-core simulators and running benchmarks on these simulators. We also calculate chip area and power consumption for different multi-core embedded architectures with a varying number of processor cores and cache configurations to provide a comparative analysis of multicore embedded architectures in terms of performance, area, and power consumption. Our performance and power results indicate that multi-core embedded system architectures that leverage shared last-level caches (LLCs) provide the best LLC performance per watt but may introduce main memory response time and throughput bottlenecks for high cache miss rates, whereas architectures leveraging a hybrid of private and shared LLCs alleviate main memory bottlenecks at the expense of reduced performance per watt.