Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Detecting pipeline structural hazards quickly
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
Employing finite automata for resource scheduling
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
EXPRESSION: a language for architecture exploration through compiler/simulator retargetability
DATE '99 Proceedings of the conference on Design, automation and test in Europe
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Pipelining and Bypassing in a VLIW Processor
IEEE Transactions on Parallel and Distributed Systems
RTGEN: An Algorithm for Automatic Generation of Reservation Tables from Architectural Descriptions
Proceedings of the 12th international symposium on System synthesis
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
The engineering design of the stretch computer
IRE-AIEE-ACM '59 (Eastern) Papers presented at the December 1-3, 1959, eastern joint IRE-AIEE-ACM computer conference
Scalable register bypassing for FPGA-based processors
Microprocessors & Microsystems
Hi-index | 0.01 |
Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale.