Optimal pipelining in supercomputers
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance
IEEE Transactions on Computers
Simulation and analysis of a pipeline processor
WSC '89 Proceedings of the 21st conference on Winter simulation
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Profetching and memory system behavior of the SPEC95 benchmark suite
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Dynamic addressing memory arrays with physical locality
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitectural denial of service: insuring microarchitectural fairness
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Recycling waste: exploiting wrong-path execution to improve branch prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic memory instruction bypassing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic Data Dependence Tracking and its Application to Branch Prediction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Use-Based Register Caching with Decoupled Indexing
Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Power-optimal pipelining in deep submicron technology
Proceedings of the 2004 international symposium on Low power electronics and design
IBM Journal of Research and Development
Alloyed branch history: combining global and local branch history for robust performance
International Journal of Parallel Programming
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamically Trading Frequency for Complexity in a GALS Microprocessor
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Effects of speculation on performance and issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The optimum pipeline depth considering both power and performance
ACM Transactions on Architecture and Code Optimization (TACO)
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
Dynamic memory instruction bypassing
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
A performance counter architecture for computing accurate CPI components
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
An analysis of the effects of miss clustering on the cost of a cache miss
Proceedings of the 4th international conference on Computing frontiers
ReCycle:: pipeline adaptation to tolerate process variation
Proceedings of the 34th annual international symposium on Computer architecture
VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 2007 workshop on Experimental computer science
ecs'07 Experimental computer science on Experimental computer science
A superscalar simulation employing poisson distributed stalls
Computers and Electrical Engineering
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A mechanistic performance model for superscalar out-of-order processors
ACM Transactions on Computer Systems (TOCS)
Profile-based dynamic pipeline scaling
The Journal of Supercomputing
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design
ACM SIGARCH Computer Architecture News
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Finding representative workloads for computer system design
Finding representative workloads for computer system design
Compiler support for dynamic pipeline scaling
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Challenges and methodologies for efficient power budgeting across the die
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Fine grain pipeline systems for real-time motion and stereo-vision computation
International Journal of High Performance Systems Architecture
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic microarchitectural pipelining
Proceedings of the Conference on Design, Automation and Test in Europe
System-level process variability analysis and mitigation for 3D MPSoCs
Proceedings of the Conference on Design, Automation and Test in Europe
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Journal of Computer Science and Technology
Pipeline strategy for improving optimal energy efficiency in ultra-low voltage design
Proceedings of the 48th Design Automation Conference
CPU DB: recording microprocessor history
Communications of the ACM
Micro-architecture performance estimation by formula
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
CPU DB: Recording Microprocessor History
Queue - Processors
Hi-index | 0.02 |
The impact of pipeline length on the performance of a microprocessor is explored both theoretically and by simulation. An analytical theory is presented that shows two opposing architectural parameters affect the optimal pipeline length: the degree of instruction level parallelism (superscalar) decreases the optimal pipeline length, while the lack of pipeline stalls increases the optimal pipeline length. This theory is tested by analyzing the optimal pipeline length for 35 applications representing three classes of workloads. Trace tapes are collected from SPEC95 and SPEC2000 applications, traditional (legacy) database and on-line transaction processing (OLTP) applications, and modern (e. g. web) applications primarily written in Java and C++. The results show that there is a clear and significant difference in the optimal pipeline length between the SPEC workloads and both the legacy and modern applications. The SPEC applications, written in C, optimize to a shorter pipeline length than the legacy applications, largely written in assembler language, with relatively little overlap in the two distributions. Additionally, the optimal pipeline length distribution for the C++ and Java workloads overlaps with the legacy applications, suggesting similar workload characteristics. These results are explored across a wide range of superscalar processors, both in-order and out-of-order.