Available paralellism in video applications

Authors:
Heng Liao;Andrew Wolfe
Affiliations:
Princeton University;Princeton University
Venue:
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Year:
1997

Citing 6
Cited 12

Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Hardware-Software Interactions on Mpact

IEEE Micro
Datapath design for a VLIW Video Signal Processor

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data-path synthesis of VLIW video signal processors

Proceedings of the 11th international symposium on System synthesis
Eclipse: Heterogeneous Multiprocessor Architecture for Flexible Media Processing

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Dynamic Parallel media processing using Speculative Broadcast Loop (SBL)

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Quantifying behavioral differences between multimedia and general-purpose workloads

Journal of Systems Architecture: the EUROMICRO Journal
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Experiences with optimizing two stream-based applications for cluster execution

Journal of Parallel and Distributed Computing
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
A video specific instruction set architecture for ASIP design

VLSI Design
MediaBench II video: Expediting the next generation of video systems research

Microprocessors & Microsystems
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures

International Journal of Parallel Programming
Parallel Scalability of Video Decoders

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most recent research in instruction-level parallelism has focused on general-purpose applications such as the SPEC benchmarks. Many quantitative experiments have been performed over the years measuring the impact of different execution models and optimization techniques on these applications. Researchers have been developing various ILP architectures for media processors in order to exploit parallelism in audio, video, and graphics applications. It has been assumed that these applications contain far more potential parallelism than general-purpose code, but there have been few attempts to quantify the available parallelism. We present a linear complexity global scheduling algorithm that can process very long traces up to 1 billion operations. Therefore, traces of video applications such as MPEG1, MPEG2, MPEG4 and H.263 encoders and decoders can be analyzed. Using an idealized execution model, speedups of over 1000 have been found in some applications. The experiment shows that eliminating currently identifiable bottlenecks can allow the exploitation of huge amounts of ILP in audio and video applications.