A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced compiler design and implementation
Advanced compiler design and implementation
IEEE Transactions on Computers
The design of the MGAP-2: a micro-grained massively parallel array
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A Matrix-Based Approach to the Global Locality Optimization Problem
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic compilation to a coarse-grained reconfigurable system-opn-chip
ACM Transactions on Embedded Computing Systems (TECS)
Design and Analysis of a Programmable Single-Chip Architecture for DVB-T Base-Band Receiver
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Boosting the performance of multimedia applications using SIMD instructions
CC'05 Proceedings of the 14th international conference on Compiler Construction
Overflow controlled SIMD arithmetic
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Data pipeline optimization for shared memory multiple-SIMD architecture
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Hi-index | 0.00 |
With the rapid growth of multimedia and game, these applications put more and more pressure on the processing ability of modern processors. Multiple SIMD architecture is widely used in multimedia processing field as a multimedia accelerator.With the consideration of power consumption and chip size, shared memory multiple SIMD architecture is mainly used in embedded SOCs. In order to further fit mobile environment, there is the constraint of limited register number as well. Although shared memory multiple SIMD architecture simplify the chip design, these constraints are the major obstacles to map the real multimedia applications to these architectures. Until now, to our best knowledge, there is little research on the optimizing techniques for shared memory multiple SIMD architecture.In this paper, we present a compiler framework, which aims at automatically generating high performance codes for shared memory multiple SIMD architecture. In this framework, we reduce the competition of shared data bus through increasing the register locality, improve the utilization of data bus by read-only data vector replication and solve the problem of limited register number through a resource allocation algorithm. The framework also handlers the issues concerning on data transformation. As the experimental results shown, this framework is successful in mapping real multimedia applications to shared memory multiple SIMD architecture. It leads to an average speedup by a factor of 3.19 and an average utilization of SM-SIMD architecture with 8 SIMD units by a factor of 52.6%.