Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream
IEEE Transactions on Computers
The multiscalar architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
Communications of the ACM - Special issue on computer architecture
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Control-Flow Independence Reuse via Dynamic Vectorization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Power-efficient instruction delivery through trace reuse
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Challenges in exploitation of loop parallelism in embedded applications
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Proceedings of the 13th international symposium on Low power electronics and design
On the exploitation of loop-level parallelism in embedded applications
ACM Transactions on Embedded Computing Systems (TECS)
LPA: a first approach to the loop processor architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Hi-index | 0.00 |
Several ILP limit studies indicate the presence of considerable ILP across dynamically far-apart instructions in program execution. This paper proposes a hardware mechanism, dynamic vectorization (DV), as a tool for quickly building up a large logical instruction window. Dynamic vectorization converts repetitive dynamic instruction sequences into vector form, enabling the processing of instructions from beyond the corresponding program loop to be overlapped with the loop. This enables vector-like execution of programs with relatively complex static control flow that may not be amenable to static, compile time vectorization. Experimental evaluation shows that a large fraction of the dynamic instructions of four of the six SPECInt92 programs can be captured in vector form. Three of these programs exhibit significant potential for ILP improvements from dynamic vectorization, with speedups of more than a factor of 2 in a scenario of realistic branch prediction and perfect memory disambiguation. Under perfect branch prediction conditions, a fourth program also shows well over a factor of 2 speedup from DV. The speedups are due to the overlap of post-loop processing with loop processing.