A Heterogeneous Multiprocessor Architecture for Flexible Media Processing
IEEE Design & Test
Workload Design: Selecting Representative Program-Input Pairs
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Complexity-Scalable Transform Coding Using Variable Complexity Algorithms
DCC '00 Proceedings of the Conference on Data Compression
Workload Characterization: Motivation, Goals and Methodology
WWC '98 Proceedings of the Workload Characterization: Methodology and Case Studies
On-chip traffic modeling and synthesis for MPEG-2 video applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Using offline bitstream analysis for power-aware video decoding in portable devices
Proceedings of the 13th annual ACM international conference on Multimedia
Measuring Benchmark Similarity Using Inherent Program Characteristics
IEEE Transactions on Computers
Resource prediction for media stream decoding
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 15th international conference on Multimedia
Scenario selection and prediction for DVS-aware scheduling of multimedia applications
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Hi-index | 0.00 |
Currently, performance analysis of multimedia-MPSoC platforms largely rely on simulation. The execution of one or more applications on such a platform is simulated for a library of test video clips. If all specified performance constraints are satisfied for this library, then the architecture is assumed to be well-designed. This is similar to testing software for functional correctness. However, in contrast to functional testing, simulating a set of video clips for a complex application/architecture is extremely time consuming. In this paper we propose a technique for clustering a library of video clips, such that it is sufficient to simulate only one clip from each cluster rather than the entire library. Our clustering is scalable, i.e., the number of clusters may be determined based on the number of clips that the system designer wishes to simulate (which is independent of the input library size). For each video clip in the library, we perform a fast bitstream analysis from which the workload generated while processing this clip on the given architecture may be estimated. This workload information, in conjunction with a workload model and a performance model of the architecture, is used for the clustering. This entire process does not involve any simulation and is hence extremely fast. We illustrate its utility through a detailed case study using an MPEG-2 decoder application running on an MPSoC platform. As part of validation of our methodology, it was observed that video clips falling into the same cluster exhibit similar worst case buffer backlogs and worst case delays for one macroblock. Overall the results demonstrate that the proposed method provides a very fast and accurate analysis and hence can be of significant benefit to the system designer.