Speculative dynamic vectorization

Authors:
Alex Pajuelo;Antonio González;Mateo Valero
Affiliations:
Universitat Politècnica de Catalunya, Barcelona --- Spain;Universitat Politècnica de Catalunya, Barcelona --- Spain;Universitat Politècnica de Catalunya, Barcelona --- Spain
Venue:
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Year:
2002

Citing 20
Cited 7

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream

IEEE Transactions on Computers
POWER2: next generation of the RISC System/6000 family

IBM Journal of Research and Development
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Initial results on the performance and cost of vector microprocessors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Widening resources: a cost-effective technique for aggressive ILP architectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Control independence in trace processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Vector instruction set support for conditional operations

Proceedings of the 27th annual international symposium on Computer architecture
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
High Bandwidth On-Chip Cache Design

IEEE Transactions on Computers
Memory Address Prediction for Data Speculation

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Control Speculation in Multithreaded Processors through Dynamic Loop Detection

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A Study of Control Independence in Superscalar Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Compiler Transformations for High-Performance Computing

Compiler Transformations for High-Performance Computing
Vector microprocessors

Vector microprocessors

Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Control-Flow Independence Reuse via Dynamic Vectorization

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Sams: single-affiliation multiple-stride parallel memory scheme

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Thread fusion

Proceedings of the 13th international symposium on Low power electronics and design
Speculatively vectorized bytecode

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Vapor SIMD: Auto-vectorize once, run everywhere

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Vectorization past dependent branches through speculation

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional vector architectures have shown to be very effective for regular codes where the compiler can detect data-level parallelism. However, this SIMD parallelism is also present in irregular or pointer-rich codes, for which the compiler is quite limited to discover it. In this paper we propose a microarchitecture extension in order to exploit SIMD parallelism in a speculative way. The idea is to predict when certain operations are likely to be vectorizable, based on some previous history information. In this case, these scalar instructions are executed in a vector mode. These vector instructions operate on several elements (vector operands) that are anticipated to be their input operands and produce a number of outputs that are stored on a vector register in order to be used by further instructions. Verification of the correctness of the applied vectorization eventually changes the status of a given vector element from speculative to non-speculative, or alternatively, generates a recovery action in case of misspeculation.The proposed microarchitecture extension applied to a 4-way issue superscalar processor with one wide bus is 19% faster than the same processor with 4 scalar buses to L1 data cache. This speed up is due basically to 1) the reduction in number of memory accesses, 15% for SpecInt and 20% for SpecFP, 2) the transformation of scalar arithmetic instructions into their vector counterpart, 28% for SpecInt and 23% for SpecFP, and 3) the exploitation of control independence for mispredicted branches.