Architectural exploration of heterogeneous multiprocessor systems for JPEG

Authors:
Seng Lin Shee;Andrea Erdos;Sri Parameswaran
Affiliations:
School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia and National Information and Communications Technology Australia, Sydney, Australia;School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia;School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia and National Information and Communications Technology Australia, Sydney, Australia
Venue:
International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
Year:
2008

Citing 10
Cited 3

Introduction to parallel computing

Introduction to parallel computing
A tool for partitioning and pipelined scheduling of hardware-software systems

Proceedings of the 11th international symposium on System synthesis
Heterogeneous Computing: Goals, Methods, and Open Problems

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Multigrain Parallel Processing for JPEG Encoding on a Single Chip Multiprocessor

IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Task Partitioning Upon Heterogeneous Multiprocessor Platforms

RTAS '04 Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium
Heterogeneous Chip Multiprocessors

Computer
Task Partitioning with Replication upon Heterogeneous Multiprocessor Systems

RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
Macro pipelining based scheduling on high performance heterogeneousmultiprocessor systems

IEEE Transactions on Signal Processing
Custom-instruction synthesis for extensible-processor platforms

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Bi-criteria Pipeline Mappings for Parallel Image Processing

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder)

International Journal of High Performance Computing Applications
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore processors have been utilized in embedded systems and general computing applications for some time. However, these multicore chips execute multiple applications concurrently, with each core carrying out a particular task in the system. Such systems can be found in gaming, automotive real-time systems and video / image encoding devices. These system are commonly deployed to overcome deadline misses, which are primarily due to overloading of a single multitasking core. In this paper, we explore the use of multiple cores for a single application, as opposed to multiple applications executing in a parallel fashion. A single application is parallelized using two different methods: one, a master-slave model; and two, a sequential pipeline model. The systems were implemented using Tensilica's Xtensa LX processors with queues as the means of communications between two cores. In a master-slave model, we utilized a course grained approach whereby a main core distributes the workload to the remaining cores and reads the processed data before writing the results back to file. In the pipeline model, a lower granularity is used. The application is partitioned into multiple sequential blocks; each block representing a stage in a sequential pipeline. For both models we applied a number of differing configurations ranging from a single core to a nine-core system. We found that without any optimization for the seven core system, the sequential pipeline approach has a more efficient area usage, with an area increase to speedup ratio of 1.83 compared to the master-slave approach of 4.34. With selective optimization in the pipeline approach, we obtained speed ups of up to 4.6× while with an area increase of only 3.1× (area increase to speedup ratio of just 0.68).