IEEE Spectrum
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The multiscalar architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Alternative implementations of hybrid branch predictors
Proceedings of the 28th annual international symposium on Microarchitecture
Control flow prediction with tree-like subgraphs for superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An empirical study of decentralized ILP execution models
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Typing the ISA to cluster the processor
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
Typing the ISA to Cluster the Processor
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Hi-index | 0.00 |
The multiscalar processing model extracts instruction level parallelism from ordinary programs by splitting the program into smaller, possibly dependent, tasks, and parallelly executing multiple tasks using multiple execution units. Past work had advocated pursuing multiple flows of control in the multiscalar processor. We first illustrate the problems involved in pursuing multiple flows of control. We then discuss a methodology to obtain good performance from multiple tasks extracted from a single line of control. We also present the results of simulation studies that verify the potential of this method. These results, obtained with the SPEC92 benchmarks, show better issue rates when a single line of control is pursued in the multiscalar processor. The primary reason for this improvement is the ability to have better load balancing among the execution units.