A unified vector/scalar floating-point architecture
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Design and evaluation of dynamic access ordering hardware
ICS '96 Proceedings of the 10th international conference on Supercomputing
Simple vector microprocessors for multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Adding a vector unit to a superscalar processor
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
MOM: a matrix SIMD instruction set architecture for multimedia applications
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Efficient conditional operations for data-parallel architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
The architecture of the DIVA processing-in-memory chip
ICS '02 Proceedings of the 16th international conference on Supercomputing
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Multithreaded Vector Architectures
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Vector microprocessors
Pseudo-vector machine for embedded applications
Pseudo-vector machine for embedded applications
Scalable vector media-processors for embedded systems
Scalable vector media-processors for embedded systems
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Journal of Signal Processing Systems
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures
ACM Transactions on Embedded Computing Systems (TECS)
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for multiple forms of parallelism, including ILP, TLP, and various forms of DLP, such as subword SIMD, short vectors, and streams. Based on our observations, we propose an architecture, called ALP, that efficiently integrates all of these forms of parallelism with evolutionary changes to the programming model and hardware. The novel part of ALP is a DLP technique called SIMD vectors and streams (SVectors/SStreams), which is integrated within a conventional superscalar-based CMP/SMT architecture with subword SIMD. This technique lies between subword SIMD and vectors, providing significant benefits over the former at a lower cost than the latter. Our evaluations show that each form of parallelism supported by ALP is important. Specifically, SVectors/SStreams are effective, compared to a system with the other enhancements in ALP. They give speedups of 1.1 to 3.4X and energy-delay product improvements of 1.1 to 5.1X for applications with DLP.