A Hardware Task Scheduler for Embedded Video Processing
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Processing data streams with hard real-time constraints on heterogeneous systems
Proceedings of the international conference on Supercomputing
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures
ACM Transactions on Embedded Computing Systems (TECS)
An efficient scheduler of RTOS for multi/many-core system
Computers and Electrical Engineering
Hardware support for fine-grained event-driven computation in Anton 2
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
High-Speed FPGA Architecture for CABAC Decoding Acceleration in H.264/AVC Standard
Journal of Signal Processing Systems
Hi-index | 0.00 |
Efficient utilization of multi-core architectures relies on the partitioning of applications into tasks and mapping the tasks to cores. In some applications (e.g. H.264 video decoding parallelized at macro-block level) these tasks have dependencies among each other. Task scheduling, consisting of selecting a task with satisfied dependencies and mapping it to a core, is typically a functionality delegated to the Operating System. In this paper we present a hardware Task Management Unit (TMU) that looks ahead in time to find tasks to be executed by a multi-core architecture. The look-ahead functionality is shown to reduce the task management overhead by 40-50% when executing a parallelized version of an H.264 video decoder on an architecture with up to 16 cores. In overall, the TMU-based multi-core architecture reaches a speedup of more than 14x on 16 cores running H.264 video decoding, assuming CABAC is implemented in a dedicated coprocessor.