A Simulation Study of Decoupled Architecture Computers
IEEE Transactions on Computers
The TI Advanced Scientific Computer
Computer
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Issues in the design of a decoupled architecture for a Risc environment
Issues in the design of a decoupled architecture for a Risc environment
Maximizing memory bandwidth for streamed computations
Maximizing memory bandwidth for streamed computations
Performance modeling and code partitioning for the DS architecture
Proceedings of the 25th annual international symposium on Computer architecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
DSP Processor Fundamentals: Architectures and Features
DSP Processor Fundamentals: Architectures and Features
On the Efficiency of Reductions in µ-SIMD Media Extensions
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Cost-Effective Hardware Acceleration of Multimedia Applications
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Interactive presentation: A decoupled architecture of processors with scratch-pad memory hierarchy
Proceedings of the conference on Design, automation and test in Europe
Journal of Signal Processing Systems
Hi-index | 0.00 |
Decoupled architectures are fine-grain processors that partition the memory access and execute functions in a computer program and exploit the parallelism between the two functions. Although some concepts from the traditional decoupled access execute paradigm made its way into commercial processors, they encountered resistance in general-purpose applications because these applications are not very structured and regular. However, multimedia applications have recently become dominant workload on desktops and workstations. Media applications are very structured and regular and lend themselves well to the decoupling concept. In this paper, we present an architecture that decouples the useful/true computations from the overhead/supporting instructions in media applications. The proposed scheme is incorporated into an out-of-order general-purpose processor enhanced with SIMD extensions. Explicit hardware support is provided to exploit instruction level parallelism in the overhead component. Performance evaluation shows that such hardware can significantly improve performance over conventional SIMD enhanced general-purpose processors. Results on nine multimedia benchmarks show that the proposed MediaBreeze architecture provides a 1.05x to 16.7x performance improvement over a 2-way out-of-order SIMD machine. On introducing slip-based data prefetching, a performance improvement up to 28x is observed.