Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Hardware support for large atomic units in dynamically scheduled machines
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
Improving semi-static branch prediction by code replication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Self-parallelization of sequential object codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Dynamic path-based branch correlation
Proceedings of the 28th annual international symposium on Microarchitecture
Control flow prediction with tree-like subgraphs for superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Expansion Caches For Superscalar Processors
Expansion Caches For Superscalar Processors
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Initial results on the performance and cost of vector microprocessors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical study of decentralized ILP execution models
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
MPS: Miss-Path Scheduling for Multiple-Issue Processors
IEEE Transactions on Computers
Aggressive Dynamic Execution of Decoded Traces
Journal of VLSI Signal Processing Systems - Special issue on the 1997 IEEE workshop on signal processing systems (SiPS): design and implementation
Binary translation and architecture convergence issues for IBM system/390
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
PipeRench implementation of the instruction path coprocessor
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance characterization of a hardware mechanism for dynamic optimization
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Execution-Based Scheduling for VLIW Architectures
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Improving quasi-dynamic schedules through region slip
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
DISE: a programmable macro engine for customizing applications
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting compiler-generated schedules for energy savings in high-performance processors
Proceedings of the 2003 international symposium on Low power electronics and design
Aggressive Dynamic Execution of Multimedia Kernel Traces
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
A Dependency Chain Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Execution cache-based microarchitecture power-efficient superscalar processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines
Proceedings of the 32nd annual international symposium on Computer Architecture
Low-power, low-complexity instruction issue using compiler assistance
Proceedings of the 19th annual international conference on Supercomputing
Managing bounded code caches in dynamic binary optimization systems
ACM Transactions on Architecture and Code Optimization (TACO)
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reusing cached schedules in an out-of-order processor with in-order issue logic
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Dynamic instruction scheduling in a trace-based multi-threaded architecture
International Journal of Parallel Programming
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
On the Behaviours Produced by Instruction Sequences under Execution
Fundamenta Informaticae
Hi-index | 0.00 |
Modern processors employ a large amount of hardware to dynamically detect parallelism in single-threaded programs and maintain the sequential semantics implied by these programs. The complexity of some of this hardware diminishes the gains due to parallelism because of longer clock period or increased pipeline latency of the machine.In this paper we propose a processor implementation which dynamically schedules groups of instructions while executing them on a fast simple engine and caches them for repeated execution on a fast VLIW-type engine. Our experiments show that scheduling groups spanning several basic blocks and caching these scheduled groups results in significant performance gain over fill buffer approaches for a standard VLIW cache.This concept, which we call DIF (Dynamic Instruction Formatting), unifies and extends principles underlying several schemes being proposed today to reduce superscalar processor complexity. This paper examines various issues in designing such a processor and presents results of experiments using trace-driven simulation of SPECint95 benchmark programs.