IPC Considered Harmful for Multiprocessor Workloads

Authors:
Alaa R. Alameldeen;David A. Wood
Affiliations:
Intel Corporation;University of Wisconsin-Madison
Venue:
IEEE Micro
Year:
2006

Citing 16
Cited 31

The YAGS branch prediction scheme

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Simulating a $2M Commercial Server on a $2K PC

Computer
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Effects of Mispredicted-Path Execution on Branch Prediction Structures

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Redeeming IPC as a Performance Metric for Multithreaded Programs

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Characterizing and Comparing Prevailing Simulation Techniques

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Efficiently Evaluating Speedup Using Sampled Processor Simulation

IEEE Computer Architecture Letters
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Computer Organization and Design: The Hardware/Software Interface

Computer Organization and Design: The Hardware/Software Interface

Coherence Ordering for Ring-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
Characterization of Apache web server with Specweb2005

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
PAM: a novel performance/power aware meta-scheduler for multi-core systems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
In-network coherence filtering: snoopy coherence without broadcasts

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Utilizing predictors for efficient thermal management in multiprocessor SoCs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Architectural implications of cache coherence protocols with network applications on chip multiprocessors

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
On-Chip Network Evaluation Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Run-time energy management of manycore systems through reconfigurable interconnects

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Scalable power control for many-core architectures running multi-threaded applications

Proceedings of the 38th annual international symposium on Computer architecture
Capacity metric for chip heterogeneous multiprocessors

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A systematic methodology to develop resilient cache coherence protocols

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Empirical measurement of instruction level parallelism for four generations of ARM CPUs

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Paragon: QoS-aware scheduling for heterogeneous datacenters

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Quantifying the impact of frequency scaling on the energy efficiency of the single-chip cloud computer

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
SMT-centric power-aware thread placement in chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
DeepDive: transparently identifying and managing performance interference in virtualized environments

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling

ACM Transactions on Architecture and Code Optimization (TACO)
QoS-Aware scheduling in heterogeneous datacenters with paragon

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many architectural simulation studies use instructions per cycle (IPC) to analyze performance. For multithreaded programs running on multiprocessor systems, however, IPC often inaccurately reflects performance and leads to incorrect or misleading conclusions. Work-related metrics, such as time per transaction, are the most accurate and reliable way to estimate multiprocessor workload performance.