Introduction to parallel computing
Introduction to parallel computing
A tool for partitioning and pipelined scheduling of hardware-software systems
Proceedings of the 11th international symposium on System synthesis
Heterogeneous Computing: Goals, Methods, and Open Problems
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Multigrain Parallel Processing for JPEG Encoding on a Single Chip Multiprocessor
IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Task Partitioning Upon Heterogeneous Multiprocessor Platforms
RTAS '04 Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium
Heterogeneous Chip Multiprocessors
Computer
Task Partitioning with Replication upon Heterogeneous Multiprocessor Systems
RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
Macro pipelining based scheduling on high performance heterogeneousmultiprocessor systems
IEEE Transactions on Signal Processing
Custom-instruction synthesis for extensible-processor platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Bi-criteria Pipeline Mappings for Parallel Image Processing
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder)
International Journal of High Performance Computing Applications
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
Multicore processors have been utilized in embedded systems and general computing applications for some time. However, these multicore chips execute multiple applications concurrently, with each core carrying out a particular task in the system. Such systems can be found in gaming, automotive real-time systems and video / image encoding devices. These system are commonly deployed to overcome deadline misses, which are primarily due to overloading of a single multitasking core. In this paper, we explore the use of multiple cores for a single application, as opposed to multiple applications executing in a parallel fashion. A single application is parallelized using two different methods: one, a master-slave model; and two, a sequential pipeline model. The systems were implemented using Tensilica's Xtensa LX processors with queues as the means of communications between two cores. In a master-slave model, we utilized a course grained approach whereby a main core distributes the workload to the remaining cores and reads the processed data before writing the results back to file. In the pipeline model, a lower granularity is used. The application is partitioned into multiple sequential blocks; each block representing a stage in a sequential pipeline. For both models we applied a number of differing configurations ranging from a single core to a nine-core system. We found that without any optimization for the seven core system, the sequential pipeline approach has a more efficient area usage, with an area increase to speedup ratio of 1.83 compared to the master-slave approach of 4.34. With selective optimization in the pipeline approach, we obtained speed ups of up to 4.6× while with an area increase of only 3.1× (area increase to speedup ratio of just 0.68).