Computer
Pathlength reduction features in the PA-RISC architecture
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
PA-RISC 2.0 architecture
The visual instruction set (VIS) in UltraSPARC
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Realtime MPEG video via software decompression on a PA-RISC processor
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
64-bit and Multimedia Extensions in the PA-RISC 2.0 Architecture
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Improving Performance for Software MPEG Players
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Communications of the ACM
Performance enhancement of H.263 encoder based on zero coefficient prediction
MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Optimizing the data cache performance of a software MPEG-2 video decoder
MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Trace-driven studies of VLIW video signal processors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A performance study of out-of-order vector architectures and short registers
ICS '98 Proceedings of the 12th international conference on Supercomputing
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Simple vector microprocessors for multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Multimedia Signal Processors: An Architectural Platform with Algorithmic Compilation
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Hardware Realization of a Java Virtual Machine for High Performance Multimedia Applications
Journal of VLSI Signal Processing Systems - Special issue on the 1997 IEEE workshop on signal processing systems (SiPS): design and implementation
Instruction Set Extensions for MPEG-4 Video
Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
ACM Transactions on Computer Systems (TOCS)
Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs
Proceedings of the conference on Design, automation and test in Europe
The architecture of the DIVA processing-in-memory chip
ICS '02 Proceedings of the 16th international conference on Supercomputing
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Securing wireless data: system architecture challenges
Proceedings of the 15th international symposium on System Synthesis
Compilation Techniques for Multimedia Processors
International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
Subword Extensions for Video Processing on Mobile Systems
IEEE Concurrency
Real-time stereo within the VIDET Project
Real-Time Imaging
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Multimedia Extensions and Sub-word Parallelism in Image Processing: Preliminary Results
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Recent Developments in the Design of Conventional Cryptographic Algorithms
State of the Art in Applied Cryptography, Course on Computer Security and Industrial Cryptography - Revised Lectures
Quantifying behavioral differences between multimedia and general-purpose workloads
Journal of Systems Architecture: the EUROMICRO Journal
Fast Subword Permutation Instructions Using Omega and Flip Network Stages
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Architectural techniques for accelerating subword permutations with repetitions
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A high-speed energy-efficient 64-bit reconfigurable binary adder
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Emerging challenges in designing secure mobile appliances
Ambient intelligence
MaRS: a macro-pipelined reconfigurable system
Proceedings of the 1st conference on Computing frontiers
Efficient orchestration of sub-word parallelism in media processors
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Security in embedded systems: Design challenges
ACM Transactions on Embedded Computing Systems (TECS)
Scalable Parallel Memory Architectures for Video Coding
Journal of VLSI Signal Processing Systems
Securing Mobile Appliances: New Challenges for the System Designer
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Variable precision arithmetic circuits for FPGA-based multimedia processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Retargeting Sequential Image-Processing Programs for Data Parallel Execution
IEEE Transactions on Software Engineering
Superword-Level Parallelism in the Presence of Control Flow
Proceedings of the international symposium on Code generation and optimization
PLX: An Instruction Set Architecture and Testbed for Multimedia Information Processing
Journal of VLSI Signal Processing Systems
A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System
Journal of VLSI Signal Processing Systems
The impact of grain size on the efficiency of embedded SIMD image processing architectures
Journal of Parallel and Distributed Computing
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
Matrix register file and extended subwords: two techniques for embedded media processors
Proceedings of the 2nd conference on Computing frontiers
Accelerating Mobile Video: A 64-Bit SIMD Architecture for Handheld Applications
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
A PC-based real-time stereo vision system
Machine Graphics & Vision International Journal
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Optimizing mobile multimedia using SIMD techniques
Multimedia Tools and Applications
Avoiding conversion and rearrangement overhead in SIMD architectures
International Journal of Parallel Programming
Quantized color instruction set for media-on-demand applications
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
The Impact of Multimedia Extensions for Multimedia Applications on Mobile Computing Systems
APCHI '08 Proceedings of the 8th Asia-Pacific conference on Computer-Human Interaction
Accelerating the Whirlpool Hash Function Using Parallel Table Lookup and Fast Cyclical Permutation
Fast Software Encryption
Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors
Journal of Signal Processing Systems
Parallel processing for image and video processing: Issues and challenges
Parallel Computing
An Enhanced DMA Controller in SIMD Processors for Video Applications
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
A multi-streaming SIMD architecture for multimedia applications
Proceedings of the 6th ACM conference on Computing frontiers
Evaluating compiler technology for control-flow optimizations for multimedia extension architectures
Microprocessors & Microsystems
Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Efficient content analysis engine for visual surveillance network
IEEE Transactions on Circuits and Systems for Video Technology
Multiplication acceleration through twin precision
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Low-complexity bit-parallel multipliers for a class of GF(2m) based on modified Booth's algorithm
International Journal of Computers and Applications
Instruction merging to increase parallelism in VLIW architectures
SOC'09 Proceedings of the 11th international conference on System-on-chip
SHA: a design for parallel architectures?
EUROCRYPT'97 Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques
A multi-streaming SIMD multimedia computing engine
Microprocessors & Microsystems
Parallel programming for multimedia applications
Multimedia Tools and Applications
Color-Aware Instructions for Embedded Superscalar Processors
Journal of Signal Processing Systems
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Algorithms and architectures for 2D discrete wavelet transform
The Journal of Supercomputing
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Vectorization past dependent branches through speculation
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.02 |
Subword Parallelism is a technique that enables the full use of word-oriented datapaths when dealing with lower-precision data. It is a form of low-cost, small-scale SIMD parallelism. This paper discusses the needs of subword parallelism, and gives an example of how it can be supported with a very small set of MAX-2 instructions. MAX-2 is the second generation of Multimedia Acceleration eXtensions introduced with the PA-RISC 2.0 processors. Because MAX-2 strives to be a minimal set supporting subword parallelism, we discuss both instructions included and excluded. Mix and Permute instructions are introduced as new simple yet powerful subword rearrangement primitives. We illustrate how programs with inherent data parallelism may be accelerated with subword parallel instructions. We also give many short examples capturing key aspects of the use of MAX-2 instructions.