Clock rate versus IPC: the end of the road for conventional microarchitectures

Authors:
Vikas Agarwal;M. S. Hrishikesh;Stephen W. Keckler;Doug Burger
Affiliations:
Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin;Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin;Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin;Computer Architecture and Technology Laboratory, Department of Computer Sciences, The University of Texas at Austin
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 15
Cited 190

Optimal pipelining in supercomputers

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Dynamic IPC/clock rate optimization

Proceedings of the 25th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Rethinking Deep-Submicron Circuit Design

Computer
Will Physical Scalability Sabotage Performance Gains?

Computer
A Single-Chip Multiprocessor

Computer
Baring It All to Software: Raw Machines

Computer
The Alpha 21264 Microprocessor

IEEE Micro
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Hardware techniques to improve the performance of the processor/memory interface

Hardware techniques to improve the performance of the processor/memory interface

The impact of delay on the design of branch predictors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing wire delay penalty through value prediction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Measuring experimental error in microprocessor simulation

SSR '01 Proceedings of the 2001 symposium on Software reusability: putting software reuse in context
Application specific architectures: a recipe for fast, flexible and power efficient designs

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Designing a Modern Memory Hierarchy with Hardware Prefetching

IEEE Transactions on Computers
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
Latency and energy aware value prediction for high-frequency processors

ICS '02 Proceedings of the 16th international conference on Supercomputing
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Multithreading decoupled architectures for complexity-effective general purpose computing

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Measuring Experimental Error in Microprocessor Simulation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Hardware-assisted simulated annealing with application for fast FPGA placement

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Stochastic, spatial routing for hypergraphs, trees, and meshes

FPGA '03 Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays
Instruction-Level Distributed Processing

Computer
Coping with Latency in SOC Design

IEEE Micro
Parallel simulation of chip-multiprocessor architectures

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Instruction Level Distributed Processing

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Efficient Interconnects for Clustered Microarchitectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Using the Compiler to Improve Cache Replacement Decisions

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Increasing and Detecting Memory Address Congruence

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Instruction Level Distributed Processing: Adapting to Future Technology

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
High Performance and Energy Efficient Serial Prefetch Architecture

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Speeding Up Target Address Generation Using a Self-indexed FTB (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic addressing memory arrays with physical locality

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitecture evaluation with physical planning

Proceedings of the 40th annual Design Automation Conference
Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploring the VLSI Scalability of Stream Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Reconsidering Complex Branch Predictors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Data Dependence Tracking and its Application to Branch Prediction

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Interface Design Techniques for Single-Chip Systems

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Highly accurate and efficient evaluation of randomising set index functions

Journal of Systems Architecture: the EUROMICRO Journal
Effective ahead pipelining of instruction block address generation

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
A fast parallel reed-solomon decoder on a reconfigurable architecture

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
On-chip communication design: roadblocks and avenues

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Ambient intelligence: a computational platform perspective

Ambient intelligence
Profile-guided microarchitectural floorplanning for deep submicron processor design

Proceedings of the 41st annual Design Automation Conference
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures

Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)

Proceedings of the 31st annual international symposium on Computer architecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
A low-power in-order/out-of-order issue queue

ACM Transactions on Architecture and Code Optimization (TACO)
A low-complexity fetch architecture for high-performance superscalar processors

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing pipeline energy demands with local DVS and dynamic retiming

Proceedings of the 2004 international symposium on Low power electronics and design
A scalable, clustered SMT processor for digital signal processing

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Programming with transactional coherence and consistency (TCC)

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scalable selective re-execution for EDGE architectures

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures

IEEE Transactions on Parallel and Distributed Systems
Scalar Operand Networks

IEEE Transactions on Parallel and Distributed Systems
Effects of speculation on performance and issue queue design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The optimum pipeline depth considering both power and performance

ACM Transactions on Architecture and Code Optimization (TACO)
Inherently Workload-Balanced Clustered Microarchitecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Effective Instruction Prefetching via Fetch Prestaging

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An in-depth look at computer performance growth

ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Controlling leakage power with the replacement policy in slumberous caches

Proceedings of the 2nd conference on Computing frontiers
Demystifying on-the-fly spill code

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Instruction level redundant number computations for fast data intensive processing in asynchronous processors

Journal of Systems Architecture: the EUROMICRO Journal
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Microprocessor Design Issues: Thoughts on the Road Ahead

IEEE Micro
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Fast and fair: data-stream quality of service

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
TAPE: a transactional application profiling environment

Proceedings of the 19th annual international conference on Supercomputing
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors

IEEE Transactions on Parallel and Distributed Systems
Implementing Caches in a 3D Technology for High Performance Processors

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Memory Bank Predictors

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency

IEEE Transactions on Computers
Low-Power Design Approach of 11FO4 256-Kbyte Embedded SRAM for the Synergistic Processor Element of a Cell Processor

IEEE Micro
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining

IEEE Transactions on Computers
Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
A power aware system level interconnect design methodology for latency-insensitive systems

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
The design and implementation of a low-latency on-chip network

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic instruction schedulers in a 3-dimensional integration technology

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
The impact of the nanoscale on computing systems

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Microarchitectural floorplanning under performance and thermal tradeoff

Proceedings of the conference on Design, automation and test in Europe: Proceedings
The Atomos transactional programming language

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory

Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Branchless cycle prediction for embedded processors

Proceedings of the 2006 ACM symposium on Applied computing
Modeling wire delay, area, power, and performance in a simulation infrastructure

IBM Journal of Research and Development
Modeling instruction placement on a spatial architecture

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Design and evaluation of a hierarchical decoupled architecture

The Journal of Supercomputing
Three-dimensional integrated circuits

IBM Journal of Research and Development - Advanced silicon technology
Supporting microthread scheduling and synchronisation in CMPs

International Journal of Parallel Programming
A scalable low power issue queue for large instruction window processors

Proceedings of the 20th annual international conference on Supercomputing
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

Microprocessors & Microsystems
Executing Java programs with transactional memory

Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
The WaveScalar architecture

ACM Transactions on Computer Systems (TOCS)
Efficient scheduling of soft real-time applications on multiprocessors

Journal of Embedded Computing - Real-Time Systems (Euromicro RTS-03)
A cache design for high performance embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Heterogeneous Clustered VLIW Microarchitectures

Proceedings of the International Symposium on Code Generation and Optimization
Implementation and Evaluation of a Dynamically Routed Processor Operand Network

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Thermal-aware scheduling for future chip multiprocessors

EURASIP Journal on Embedded Systems
Enlarging Instruction Streams

IEEE Transactions on Computers
Data locality enhancement for CMPs

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Improving power efficiency of D-NUCA caches

ACM SIGARCH Computer Architecture News
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies

IEEE Transactions on Computers
A latency-conscious SMT branch prediction architecture

International Journal of High Performance Computing and Networking
Variable latency caches for nanoscale processor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
The revolution inside the box

Communications of the ACM - Web science
Software-directed combined cpu/link voltage scaling fornoc-based cmps

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A schedulable utilization bound for the multiprocessor EPDF Pfair algorithm

Real-Time Systems
A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Utilizing shared data in chip multiprocessors with the Nahalal architecture

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Concurrent CS: preparing students for a multicore world

Proceedings of the 13th annual conference on Innovation and technology in computer science education
A Non-blocking Multithreaded Architecture with Support for Speculative Threads

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
A low-complexity microprocessor design with speculative pre-execution

Journal of Systems Architecture: the EUROMICRO Journal
Dual-mode floating-point adder architectures

Journal of Systems Architecture: the EUROMICRO Journal
Comparative evaluation of memory models for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
On the potential of latency tolerant execution in speculative multithreading

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A comparative evaluation of hybrid distributed shared-memory systems

Journal of Systems Architecture: the EUROMICRO Journal
Convergent Compilation Applied to Loop Unrolling

Transactions on High-Performance Embedded Architectures and Compilers I
Demystifying magic: high-level low-level programming

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A criticality-driven microarchitectural three dimensional (3D) floorplanner

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Design and implementation of a queue compiler

Microprocessors & Microsystems
Celling SHIM: compiling deterministic concurrency to a heterogeneous multicore

Proceedings of the 2009 ACM symposium on Applied Computing
Factored operating systems (fos): the case for a scalable operating system for multicores

ACM SIGOPS Operating Systems Review
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing

Microprocessors & Microsystems
A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

Parallel Computing
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Complexity Effective Bypass Networks

Transactions on High-Performance Embedded Architectures and Compilers II
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

ACM Transactions on Architecture and Code Optimization (TACO)
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
A scalable micro wireless interconnect structure for CMPs

Proceedings of the 15th annual international conference on Mobile computing and networking
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Reducing Query Latencies in Web Search Using Fine-Grained Parallelism

World Wide Web
Access region cache with register guided memory reference partitioning

Journal of Systems Architecture: the EUROMICRO Journal
A Functional Programming Framework for Latency Insensitive Protocol Validation

Electronic Notes in Theoretical Computer Science (ENTCS)
A 186-Mvertices/s 161-mW floating-point vertex processor with optimized datapath and vertex caches

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Physical realization oriented area-power-delay tradeoff exploration

SOC'09 Proceedings of the 11th international conference on System-on-chip
Multiple stream prediction

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Providing deterministic end-to-end fairness guarantees in core-stateless networks

IWQoS'03 Proceedings of the 11th international conference on Quality of service
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
An operating system for multicore and clouds: mechanisms and implementation

Proceedings of the 1st ACM symposium on Cloud computing
The auction: optimizing banks usage in Non-Uniform Cache Architectures

Proceedings of the 24th ACM International Conference on Supercomputing
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing

Proceedings of the 37th annual international symposium on Computer architecture
Exploiting the reuse supplied by loop-dependent stream references for stream processors

ACM Transactions on Architecture and Code Optimization (TACO)
PoliMakE: a policy making engine for secure embedded software execution on chip-multiprocessors

WESS '10 Proceedings of the 5th Workshop on Embedded Systems Security
A power-efficient migration mechanism for D-NUCA caches

Proceedings of the Conference on Design, Automation and Test in Europe
Process variation aware thread mapping for chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
Virtualizing network-on-chip resources in chip-multiprocessors

Microprocessors & Microsystems
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
On-chip interconnect analysis of performance and energy metrics under different design goals

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SPMVisor: dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Toward five-dimensional scaling: how density improves efficiency in future computers

IBM Journal of Research and Development
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Next generation embedded processor architecture for personal information devices

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Fast parallel FFT on CTaiJi: a coarse-grained reconfigurable computation platform

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
A low-complexity issue queue design with speculative pre-execution

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic partition of memory reference instructions – a register guided approach

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A high efficient on-chip interconnection network in SIMD CMPs

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Single FU bypass networks for high clock rate superscalar processors

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
DDM-CMP: data-driven multithreading on a chip multiprocessor

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
A scalable, multi-thread, multi-issue array processor architecture for DSP applications based on extended tomasulo scheme

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Axiom based architecture

ACM SIGARCH Computer Architecture News
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Rapid, low-power loop execution in a network of functional units

Proceedings of the 17th Panhellenic Conference on Informatics
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems

The Journal of Supercomputing
McRouter: multicast within a router for high performance network-on-chips

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
X-Network: An area-efficient and high-performance on-chip wormhole interconnect network

Microprocessors & Microsystems
Tuning the continual flow pipeline architecture with virtual register renaming

ACM Transactions on Architecture and Code Optimization (TACO)
A novel architecture for ahead branch prediction

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.03

Visualization

Abstract

The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scali ng of the processor clock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scaling as semiconductor devices shrink, the achievable performance growth of conventional microarchitectures will slow substantially. In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay. Using the results of these models, we measure the simulated performance—estimating both clock rate and IPC —of an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm technology. We perform this analysis for three clock scaling targets and two microarchitecture scaling strategies: pipeline scaling and capacity scaling. We find that no scaling strategy permits annual performance improvements of better than 12.5%, which is far worse than the annual 50-60% to which we have grown accustomed.