Optimizing compiler for shared-memory multiple SIMD architecture

Authors:
Weihua Zhang;Xinglong Qian;Ye Wang;Binyu Zang;Chuanqi Zhu
Affiliations:
Fudan University, Shanghai, China and Chinese Academy of Sciences;Fudan University, Shanghai, China;Fudan University, Shanghai, China;Fudan University, Shanghai, China;Fudan University, Shanghai, China
Venue:
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Year:
2006

Citing 16
Cited 1

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C

Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced compiler design and implementation

Advanced compiler design and implementation
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
The design of the MGAP-2: a micro-grained massively parallel array

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A Matrix-Based Approach to the Global Locality Optimization Problem

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic compilation to a coarse-grained reconfigurable system-opn-chip

ACM Transactions on Embedded Computing Systems (TECS)
Design and Analysis of a Programmable Single-Chip Architecture for DVB-T Base-Band Receiver

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Boosting the performance of multimedia applications using SIMD instructions

CC'05 Proceedings of the 14th international conference on Compiler Construction
Overflow controlled SIMD arithmetic

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Data pipeline optimization for shared memory multiple-SIMD architecture

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid growth of multimedia and game, these applications put more and more pressure on the processing ability of modern processors. Multiple SIMD architecture is widely used in multimedia processing field as a multimedia accelerator.With the consideration of power consumption and chip size, shared memory multiple SIMD architecture is mainly used in embedded SOCs. In order to further fit mobile environment, there is the constraint of limited register number as well. Although shared memory multiple SIMD architecture simplify the chip design, these constraints are the major obstacles to map the real multimedia applications to these architectures. Until now, to our best knowledge, there is little research on the optimizing techniques for shared memory multiple SIMD architecture.In this paper, we present a compiler framework, which aims at automatically generating high performance codes for shared memory multiple SIMD architecture. In this framework, we reduce the competition of shared data bus through increasing the register locality, improve the utilization of data bus by read-only data vector replication and solve the problem of limited register number through a resource allocation algorithm. The framework also handlers the issues concerning on data transformation. As the experimental results shown, this framework is successful in mapping real multimedia applications to shared memory multiple SIMD architecture. It leads to an average speedup by a factor of 3.19 and an average utilization of SM-SIMD architecture with 8 SIMD units by a factor of 52.6%.