Rapid runtime estimation methods for pipelined MPSoCs

Authors:
Haris Javaid;Andhi Janapsatya;Mohammad Shihabul Haque;Sri Parameswaran
Affiliations:
University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2010

Citing 10
Cited 3

Design of heterogenous multi-processor embedded systems: applying functional pipelining

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Accurate and efficient regression modeling for microarchitectural performance and power prediction

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Heterogeneous multiprocessor implementations for JPEG:: a case study

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Design methodology for pipelined heterogeneous multiprocessor system

Proceedings of the 44th annual Design Automation Conference
Synthesis of heterogeneous pipelined multiprocessor systems using ILP: jpeg case study

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Exact and fast L1 cache simulation for embedded systems

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A design flow for application specific heterogeneous pipelined multiprocessor systems

Proceedings of the 46th Annual Design Automation Conference
Performance modeling using Monte Carlo simulation

IEEE Computer Architecture Letters

Optimal synthesis of latency and throughput constrained pipelined MPSoCs targeting streaming applications

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Rapid design space exploration of application specific heterogeneous pipelined multiprocessor systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Fidelity metrics for estimation models

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

The pipelined Multiprocessor System on Chip (MPSoC) paradigm is well suited to the data flow nature of streaming applications. A pipelined MPSoC is a system where processing elements (PEs) are connected in a pipeline. Each PE is implemented using one of a number of processor configurations (configurations differ by instruction sets and cache sizes) available for that PE. The goal is to select a pipelined MPSoC with a mapping of a processor configuration to every PE. To estimate the runtime of a pipelined MPSoC, designers typically perform cycle-accurate simulation of the whole pipelined system. Since the number of possible pipelined implementations can be in the order of billions, estimation methods are necessary. In this paper, we propose two methods to estimate the runtime of a pipelined MPSoC, minimizing the use of slow cycle-accurate simulations. The first method estimates the runtime of the pipelined MPSoC, by performing cycle accurate simulations of individual processor configurations (rather than the whole pipelined system), and then utilizing an analytical model to estimate the runtime of the pipelined system. In the second method, runtimes of individual processor configurations are estimated using an analytical processor model (which uses cycle-accurate simulations of selected configurations, and an equation based on ISA and cache statistics). These estimated runtimes of individual processor configurations are then used to estimate the total runtime of the pipelined system. By evaluating our approach on three benchmarks, we show that the maximum estimation error is 5.91% and 16.45%, with an average estimation error of 2.28% and 6.30% for the first and second method respectively. The time to simulate all the possible pipelined implementations (design points) using cycle-accurate simulator is in the order of years, as design spaces with at least 1010 design points are considered in this paper. However, the time to simulate all processor configurations individually (first method) takes tens of hours, while the time to simulate a subset of processor configurations and estimate their runtimes (second method) is only a few hours. Once these simulations are done, the runtime of each pipelined implementation can be estimated within milliseconds.