Data pipeline optimization for shared memory multiple-SIMD architecture

Authors:
Weihua Zhang;Tao Bao;Binyu Zang;Chuanqi Zhu
Affiliations:
Parallel Processing Institute, Fudan University, Shanghai, China and Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences;Parallel Processing Institute, Fudan University, Shanghai, China;Parallel Processing Institute, Fudan University, Shanghai, China;Parallel Processing Institute, Fudan University, Shanghai, China
Venue:
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Year:
2006

Citing 16
Cited 0

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Advanced compiler design and implementation

Advanced compiler design and implementation
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
The design of the MGAP-2: a micro-grained massively parallel array

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
How Multimedia Workloads Will Change Processor Design

Computer
Imagine: Media Processing with Streams

IEEE Micro
Distributed Modulo Scheduling

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Multimedia Instruction Sets for General Purpose Microprocessors: a

Multimedia Instruction Sets for General Purpose Microprocessors: a
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

IEEE Transactions on Computers
Automatic compilation to a coarse-grained reconfigurable system-opn-chip

ACM Transactions on Embedded Computing Systems (TECS)
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing compiler for shared-memory multiple SIMD architecture

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Boosting the performance of multimedia applications using SIMD instructions

CC'05 Proceedings of the 14th international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid growth of multimedia applications has been putting high pressure on the processing capability of modern processors, which leads to more and more modern multimedia processors employing parallel single instruction multiple data (SIMD) units to achieve high performance. In embedded system on chips (SOCs), shared memory multiple-SIMD architecture becomes popular because of its less power consumption and smaller chip size. In order to match the properties of some multimedia applications, there are interconnections among multiple SIMD units. In this paper, we present a novel program transformation technique to exploit parallel and pipelined computing power of modern shared-memory multiple-SIMD architecture. This optimizing technique can greatly reduce the conflict of shared data bus and improve the performance of applications with inherent data pipeline characteristics. Experimental results show that our method provides impressive speedup. For a shared memory multiple-SIMD architecture with 8 SIMD units, this method obtains more than 3.6X speedup for the multimedia programs.