Calculating the maximum, execution time of real-time programs
Real-Time Systems
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance analysis of embedded software using implicit path enumeration
DAC '95 Proceedings of the 32nd annual ACM/IEEE Design Automation Conference
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Static timing analysis of embedded software
DAC '97 Proceedings of the 34th annual Design Automation Conference
Multiprocessor SoC Platforms: A Component-Based Design Approach
IEEE Design & Test
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Journal of VLSI Signal Processing Systems
Evaluation of data-parallel splitting approaches for H.264 decoding
Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Parallel Scalability of Video Decoders
Journal of Signal Processing Systems
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
In this paper, we introduce a high-level simulation methodology for the modeling of multicore video processing architectures. This method allows design space explorations of parallel video processing applications (VPAs). It is used to test the performance of running a VPA on arbitrary virtual hardware and software configurations. The method represents an alternative to performing a "complete" decoder implementation on a field-programmable gate array or an application-specific integrated circuit. The use of our method, therefore, yields the advantage of being considerably more time, labor, and cost efficient. As an application, we use our method for designing a parallel H.264 decoder targeting 720p25 resolution at bit-rates up to 50 Mb/s. Starting from a single-core decoder implementation, we use our simulator for estimating the performance gain when using a multicore architecture. We then detect the major performance bottlenecks in this multicore system and perform additional decoder splittings accordingly until we reach the targeted requirements. The use of functional splitting (i.e., pipelining) and data-parallel processing is demonstrated. The final H.264 decoder architecture is capable of fulfilling our performance requirements.