ALP: Efficient support for all levels of parallelism for complex media applications

Authors:
Ruchira Sasanka;Man-Lap Li;Sarita V. Adve;Yen-Kuang Chen;Eric Debes
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, Illinois;University of Illinois at Urbana-Champaign, Urbana, Illinois;University of Illinois at Urbana-Champaign, Urbana, Illinois;Intel Corporation, Santa Clara, California;Intel Corporation, Santa Clara, California
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2007

Citing 34
Cited 3

The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
A unified vector/scalar floating-point architecture

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Design and evaluation of dynamic access ordering hardware

ICS '96 Proceedings of the 10th international conference on Supercomputing
Simple vector microprocessors for multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Adding a vector unit to a superscalar processor

ICS '99 Proceedings of the 13th international conference on Supercomputing
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
MOM: a matrix SIMD instruction set architecture for multimedia applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Efficient conditional operations for data-parallel architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The Impulse Memory Controller

IEEE Transactions on Computers
The architecture of the DIVA processing-in-memory chip

ICS '02 Proceedings of the 16th international conference on Supercomputing
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
How Multimedia Workloads Will Change Processor Design

Computer
RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors

Computer
Multithreaded Vector Architectures

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Overcoming the limitations of conventional vector processors

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Vector microprocessors

Vector microprocessors
Pseudo-vector machine for embedded applications

Pseudo-vector machine for embedded applications
Scalable vector media-processors for embedded systems

Scalable vector media-processors for embedded systems
The energy efficiency of CMP vs. SMT for multimedia workloads

Proceedings of the 18th annual international conference on Supercomputing
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Evaluating the Imagine Stream Architecture

Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The CSI multimedia architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Combining Coarse-Grained Software Pipelining with DVS for Scheduling Real-Time Periodic Dependent Tasks on Multi-Core Embedded Systems

Journal of Signal Processing Systems
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures

ACM Transactions on Embedded Computing Systems (TECS)
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for multiple forms of parallelism, including ILP, TLP, and various forms of DLP, such as subword SIMD, short vectors, and streams. Based on our observations, we propose an architecture, called ALP, that efficiently integrates all of these forms of parallelism with evolutionary changes to the programming model and hardware. The novel part of ALP is a DLP technique called SIMD vectors and streams (SVectors/SStreams), which is integrated within a conventional superscalar-based CMP/SMT architecture with subword SIMD. This technique lies between subword SIMD and vectors, providing significant benefits over the former at a lower cost than the latter. Our evaluations show that each form of parallelism supported by ALP is important. Specifically, SVectors/SStreams are effective, compared to a system with the other enhancements in ALP. They give speedups of 1.1 to 3.4X and energy-delay product improvements of 1.1 to 5.1X for applications with DLP.