Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Hardware-Software Interactions on Mpact
IEEE Micro
Datapath design for a VLIW Video Signal Processor
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data-path synthesis of VLIW video signal processors
Proceedings of the 11th international symposium on System synthesis
Eclipse: Heterogeneous Multiprocessor Architecture for Flexible Media Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Dynamic Parallel media processing using Speculative Broadcast Loop (SBL)
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Quantifying behavioral differences between multimedia and general-purpose workloads
Journal of Systems Architecture: the EUROMICRO Journal
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Experiences with optimizing two stream-based applications for cluster execution
Journal of Parallel and Distributed Computing
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
MediaBench II video: Expediting the next generation of video systems research
Microprocessors & Microsystems
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures
International Journal of Parallel Programming
Parallel Scalability of Video Decoders
Journal of Signal Processing Systems
Hi-index | 0.00 |
Most recent research in instruction-level parallelism has focused on general-purpose applications such as the SPEC benchmarks. Many quantitative experiments have been performed over the years measuring the impact of different execution models and optimization techniques on these applications. Researchers have been developing various ILP architectures for media processors in order to exploit parallelism in audio, video, and graphics applications. It has been assumed that these applications contain far more potential parallelism than general-purpose code, but there have been few attempts to quantify the available parallelism. We present a linear complexity global scheduling algorithm that can process very long traces up to 1 billion operations. Therefore, traces of video applications such as MPEG1, MPEG2, MPEG4 and H.263 encoders and decoders can be analyzed. Using an idealized execution model, speedups of over 1000 have been found in some applications. The experiment shows that eliminating currently identifiable bottlenecks can allow the exploitation of huge amounts of ILP in audio and video applications.