Multiple-block ahead branch predictors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Alternative fetch and issue policies for the trace cache fetch mechanism
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Skew-tolerant circuit design
The Alpha 21264 Microprocessor
IEEE Micro
Predicting Conditional Branches With Fusion-Based Hybrid Predictors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Dynamic addressing memory arrays with physical locality
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitectural denial of service: insuring microarchitectural fairness
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitecture evaluation with physical planning
Proceedings of the 40th annual Design Automation Conference
Phi-Predication for light-weight if-conversion
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Recycling waste: exploiting wrong-path execution to improve branch prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic memory instruction bypassing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Data Dependence Tracking and its Application to Branch Prediction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Proceedings of the 30th annual international symposium on Computer architecture
Detecting global stride locality in value streams
Proceedings of the 30th annual international symposium on Computer architecture
On-chip communication design: roadblocks and avenues
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Hardware Memory Disambiguation for High ILP Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Fast Path-Based Neural Branch Prediction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
Prophet/Critic Hybrid Branch Prediction
Proceedings of the 31st annual international symposium on Computer architecture
Use-Based Register Caching with Decoupled Indexing
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Power-optimal pipelining in deep submicron technology
Proceedings of the 2004 international symposium on Low power electronics and design
IBM Journal of Research and Development
Alloyed branch history: combining global and local branch history for robust performance
International Journal of Parallel Programming
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dynamically Trading Frequency for Complexity in a GALS Microprocessor
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Effects of speculation on performance and issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The optimum pipeline depth considering both power and performance
ACM Transactions on Architecture and Code Optimization (TACO)
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Improved latency and accuracy for neural branch prediction
ACM Transactions on Computer Systems (TOCS)
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines
Proceedings of the 32nd annual international symposium on Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Piecewise Linear Branch Prediction
Proceedings of the 32nd annual international symposium on Computer Architecture
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
Fast branch misprediction recovery in out-of-order superscalar processors
Proceedings of the 19th annual international conference on Supercomputing
A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically configurable shared CMP helper engines for improved performance
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
An automated design flow for 3D microarchitecture evaluation
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Dynamic memory instruction bypassing
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Microarchitecture evaluation with floorplanning and interconnect pipelining
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures
IEEE Transactions on Computers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic per-branch history length adjustment to improve branch prediction accuracy
Microprocessors & Microsystems
Compacting register file via 2-level renaming and bit-partitioning
Microprocessors & Microsystems
ReCycle:: pipeline adaptation to tolerate process variation
Proceedings of the 34th annual international symposium on Computer architecture
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Branch predictor on-line evolutionary system
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Hiding cache miss penalty using priority-based execution for embedded processors
Proceedings of the conference on Design, automation and test in Europe
Investigating the effects of fine-grain three-dimensional integration on microarchitecture design
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Generalizing neural branch prediction
ACM Transactions on Architecture and Code Optimization (TACO)
A criticality-driven microarchitectural three dimensional (3D) floorplanner
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Shapeshifter: Dynamically changing pipeline width and speed to address process variations
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Journal of Computer Science and Technology
A mechanistic performance model for superscalar out-of-order processors
ACM Transactions on Computer Systems (TOCS)
Complexity Effective Bypass Networks
Transactions on High-Performance Embedded Architectures and Compilers II
Journal of Computer Science and Technology
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design
ACM SIGARCH Computer Architecture News
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Configuring a real time radio signal processor on an embedded system using compiled XML
SIP '07 Proceedings of the Ninth IASTED International Conference on Signal and Image Processing
Finding representative workloads for computer system design
Finding representative workloads for computer system design
Reducing branch misprediction penalties via adaptive pipeline scaling
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Empowering a helper cluster through data-width aware instruction selection policies
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Register Cache System Not for Latency Reduction Purpose
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Optimizing integrated application performance with cache-aware metascheduling
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
2L-MuRR: a compact register renaming scheme for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Misleading energy and performance claims in sub/near threshold digital systems
Proceedings of the International Conference on Computer-Aided Design
CPU DB: recording microprocessor history
Communications of the ACM
Single FU bypass networks for high clock rate superscalar processors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Micro-architecture performance estimation by formula
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
CPU DB: Recording Microprocessor History
Queue - Processors
Trace execution automata in dynamic binary translation
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Design space exploration of hybrid ultra low power branch predictors
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Performance analysis of multi-threaded multi-core CPUs
Proceedings of the First International Workshop on Many-core Embedded Systems
Hi-index | 0.03 |
One architectural method for increasing processor performance involves increasing the frequency by implementing deeper pipelines. This paper will explore the relationship between performance and pipeline depth using a Pentium® 4 processor like architecture as a baseline and will show that deeper pipelines can continue to increase performance.This paper will show that the branch misprediction latency is the single largest contributor to performance degradation as pipelines are stretched, and therefore branch prediction and fast branch recovery will continue to increase in importance. We will also show that higher performance cores, implemented with longer pipelines for example, will put more pressure on the memory system, and therefore require larger on-chip caches. Finally, we will show that in the same process technology, designing deeper pipelines can increase the processor frequency by 100%, which, when combined with larger on-chip caches can yield performance improvements of 35% to 90% over a Pentium® 4 like processor.