Architectural exploration of heterogeneous multiprocessor systems for JPEG

  • Authors:
  • Seng Lin Shee;Andrea Erdos;Sri Parameswaran

  • Affiliations:
  • School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia and National Information and Communications Technology Australia, Sydney, Australia;School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia;School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia and National Information and Communications Technology Australia, Sydney, Australia

  • Venue:
  • International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore processors have been utilized in embedded systems and general computing applications for some time. However, these multicore chips execute multiple applications concurrently, with each core carrying out a particular task in the system. Such systems can be found in gaming, automotive real-time systems and video / image encoding devices. These system are commonly deployed to overcome deadline misses, which are primarily due to overloading of a single multitasking core. In this paper, we explore the use of multiple cores for a single application, as opposed to multiple applications executing in a parallel fashion. A single application is parallelized using two different methods: one, a master-slave model; and two, a sequential pipeline model. The systems were implemented using Tensilica's Xtensa LX processors with queues as the means of communications between two cores. In a master-slave model, we utilized a course grained approach whereby a main core distributes the workload to the remaining cores and reads the processed data before writing the results back to file. In the pipeline model, a lower granularity is used. The application is partitioned into multiple sequential blocks; each block representing a stage in a sequential pipeline. For both models we applied a number of differing configurations ranging from a single core to a nine-core system. We found that without any optimization for the seven core system, the sequential pipeline approach has a more efficient area usage, with an area increase to speedup ratio of 1.83 compared to the master-slave approach of 4.34. With selective optimization in the pipeline approach, we obtained speed ups of up to 4.6× while with an area increase of only 3.1× (area increase to speedup ratio of just 0.68).