Interpretation and instruction path coprocessing
Interpretation and instruction path coprocessing
Exploitation of instruction-level parallelism for detection of processor execution errors
Exploitation of instruction-level parallelism for detection of processor execution errors
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Region-based compilation: an introduction and motivation
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting instruction level parallelism in processors by caching scheduled groups
Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Completion time multiple branch prediction for enhancing trace cache performance
Proceedings of the 27th annual international symposium on Computer architecture
FX!32: A Profile-Directed Binary Translator
IEEE Micro
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
JMLC '97 Proceedings of the Joint Modular Languages Conference on Modular Programming Languages
Instruction Pre-Processing in Trace Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
PipeRench implementation of the instruction path coprocessor
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slice-processors: an implementation of operation-based prediction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Improving Java performance using hardware translation
ICS '01 Proceedings of the 15th international conference on Supercomputing
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Implementing optimizations at decode time
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Performance characterization of a hardware mechanism for dynamic optimization
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Instruction Level Distributed Processing
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Instruction Level Distributed Processing: Adapting to Future Technology
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic Optimization of Micro-Operations
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
An Event-Driven Multithreaded Dynamic Optimization Framework
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reducing Startup Time in Co-Designed Virtual Machines
Proceedings of the 33rd annual international symposium on Computer Architecture
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Hi-index | 0.00 |
This paper presents the concept of an Instruction Path Coprocessor (I-COP), which is a programmable on-chip coprocessor, with its own mini-instruction set, that operates on the core processor's instructions to transform them into an internal format that can be more efficiently executed. It is located off the critical path of the core processor to ensure that it does not negatively impact the core processor's cycle time or pipeline depth. An I-COP is highly versatile and can be used to implement different types of instruction transformations to enhance the IPC of the core processor. We study four potential applications of the I-COP to demonstrate the feasibility of this concept and investigate the design issues of such a coprocessor. A prototype instruction set for the I-COP is presented along with an implementation framework that facilitates achieving high I-COP performance. Initial results indicate that the I-COP is able to efficiently implement the trace cache fill unit as well as the register move, stride data prefetching and linked data structure prefetching trace optimizations.