Compiling for vector-thread architectures

Authors:
Mark Hampton;Krste Asanovic
Affiliations:
MIT Computer S ien e and Artificial Intelligence Laboratory, Cambridge, MA, USA;University of California at Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Year:
2008

Citing 27
Cited 5

A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Multi-threaded vectorization

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Program Improvement by Source-to-Source Transformation

Journal of the ACM (JACM)
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines

ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Imagine: Media Processing with Streams

IEEE Micro
Balancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Energy-exposed instruction sets

Power aware computing
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Vector-Thread Architecture

IEEE Micro
Superword-Level Parallelism in the Presence of Control Flow

Proceedings of the international symposium on Code generation and optimization
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Compiling for stream processing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Challenges in exploitation of loop parallelism in embedded applications

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Introducing Control Flow into Vectorized Code

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Vector-thread architecture and implementation

Vector-thread architecture and implementation
Trimaran: an infrastructure for research in instruction-level parallelism

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Implementing the scale vector-thread processor

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Stream Compilation for Real-Time Embedded Multicore Systems

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Proceedings of the 38th annual international symposium on Computer architecture
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vector-thread (VT) architectures exploit multiple forms of parallelism simultaneously. This paper describes a compiler for the Scale VT architecture, which takes advantage of the VT features. We focus on compiling loops, and show how the compiler can transform code that poses difficulties for traditional vector or VLIW processors, such as loops with internal control flow or cross-iteration dependences, while still taking advantage of features not supported by multithreaded designs, such as vector memory instructions. We evaluate the compiler using several embedded benchmarks and show that we can obtain substantial speedups over a single-issue, in-order scalar machine.