Benchmarking modern multiprocessors

Authors:
Kai Li;Christian Bienia
Affiliations:
Princeton University;Princeton University
Venue:
Benchmarking modern multiprocessors
Year:
2011

Citing 0
Cited 70

Ensuring operating system kernel integrity with OSck

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Evaluating the effectiveness of model-based power characterization

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
TRACON: interference-aware scheduling for data-intensive applications in virtualized environments

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Using explicit platform descriptions to support programming of heterogeneous many-core systems

Parallel Computing
Aikido: accelerating shared data dynamic analyses

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Scalable address spaces using RCU balanced trees

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Packet chaining: efficient single-cycle allocation for on-chip networks

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Studying the impact of application-level optimizations on the power consumption of multi-core architectures

Proceedings of the 9th conference on Computing Frontiers
Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints

Proceedings of the 49th Annual Design Automation Conference
Metronome: operating system level performance management via self-adaptive computing

Proceedings of the 49th Annual Design Automation Conference
Multicore acceleration of priority-based schedulers for concurrency bug detection

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Harmony: collection and analysis of parallel block vectors

Proceedings of the 39th Annual International Symposium on Computer Architecture
Runtime energy consumption estimation for server workloads based on chaotic time-series approximation

ACM Transactions on Architecture and Code Optimization (TACO)
A scalability benchmark suite for Erlang/OTP

Proceedings of the eleventh ACM SIGPLAN workshop on Erlang workshop
Scalability-based manycore partitioning

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Energy-efficient cache partitioning for future CMPs

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Comparison of Decision-Making Strategies for Self-Optimization in Autonomic Computing Systems

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special Section: Extended Version of SASO 2011 Best Paper
IFRit: interference-free regions for dynamic data-race detection

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Accurate characterization of the variability in power consumption in modern mobile processors

HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Legion: expressing locality and independence with logical regions

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
What scientific applications can benefit from hardware transactional memory?

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Probabilistic design methodology to improve run-time stability and performance of STT-RAM caches

Proceedings of the International Conference on Computer-Aided Design
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
AUDIT: Stress Testing the Automatic Way

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Supporting parallel soft real-time applications in virtualized environment

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Self-adaptive hybrid dynamic power management for many-core systems

Proceedings of the Conference on Design, Automation and Test in Europe
Energy-efficient multicore chip design through cross-layer approach

Proceedings of the Conference on Design, Automation and Test in Europe
Cache coherence enabled adaptive refresh for volatile STT-RAM

Proceedings of the Conference on Design, Automation and Test in Europe
Fast and optimized task allocation method for low vertical link density 3-dimensional networks-on-chip based many core systems

Proceedings of the Conference on Design, Automation and Test in Europe
A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Proceedings of the 40th Annual International Symposium on Computer Architecture
Protozoa: adaptive granularity cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
Micro-architectural support for metadata coherence in multi-core dynamic information flow tracking

Proceedings of the 2nd International Workshop on Hardware and Architectural Support for Security and Privacy
The autonomic operating system research project: achievements and future directions

Proceedings of the 50th Annual Design Automation Conference
Analysis and characterization of inherent application resilience for approximate computing

Proceedings of the 50th Annual Design Automation Conference
Systematic evaluation of workload clustering for extremely energy-efficient architectures

ACM SIGARCH Computer Architecture News
APE: accelerator processor extensions to optimize data-compute co-location

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Analysis and runtime management of 3D systems with stacked DRAM for boosting energy efficiency

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Deterministic scale-free pipeline parallelism with hyperqueues

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Low-power, low-storage-overhead chipkill correct via multi-line error correction

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Deadlock-free generic routing algorithms for 3-dimensional Networks-on-Chip with reduced vertical link density topologies

Journal of Systems Architecture: the EUROMICRO Journal
Language support for dynamic, hierarchical data partitioning

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Coordinated power-performance optimization in manycores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
SMT-centric power-aware thread placement in chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Towards a performance-as-a-service cloud

Proceedings of the 4th annual Symposium on Cloud Computing
Coloring the cloud for predictable performance

Proceedings of the 4th annual Symposium on Cloud Computing
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Crank it up or dial it down: coordinated multiprocessor frequency and folding control

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Multi-grain coherence directories

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
REF: resource elasticity fairness with sharing incentives for multiprocessors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DrDebug: Deterministic Replay based Cyclic Debugging with Dynamic Slicing

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
On the advantage of time-varying diversity of workload on functionally asymmetric multi-core

Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Concurrency testing using schedule bounding: an empirical study

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
A tool to analyze the performance of multithreaded programs on NUMA architectures

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
TornadoNoC: A lightweight and scalable on-chip network architecture for the many-core era

ACM Transactions on Architecture and Code Optimization (TACO)
Analysis of dependence tracking algorithms for task dataflow execution

ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Performance Counters for Energy Efficient Co-Scheduling of Mixed Workloads on Multi-Core Platforms

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
The case of using multiple streams in streaming

International Journal of Automation and Computing
Dynamic server power capping for enabling data center participation in power markets

Proceedings of the International Conference on Computer-Aided Design
Agent-based distributed power management for kilo-core processors

Proceedings of the International Conference on Computer-Aided Design
A column parity based fault detection mechanism for FIFO buffers

Integration, the VLSI Journal
Ultra-low-power adder stage design for exascale floating point units

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Exploiting multi-core nodes in peer-to-peer grids

Journal of Parallel and Distributed Computing
Aggressive Value Prediction on a GPU

International Journal of Parallel Programming
A performance-aware quality of service-driven scheduler for multicore processors

ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Benchmarking has become one of the most important methods for quantitative performance evaluation of processor and computer system designs. Benchmarking of modern multiprocessors such as chip multiprocessors is challenging because of their application domain, scalability and parallelism requirements. In my thesis, I have developed a methodology to design effective benchmark suites and demonstrated its effectiveness by developing and deploying a benchmark suite for evaluating multiprocessors. More specifically, this thesis includes several contributions. First, the thesis shows that a new benchmark suite for multiprocessors is needed because the behavior of modern parallel programs is significantly different from those represented by SPLASH-2, the most popular parallel benchmark suite developed over ten years ago. Second, the thesis quantitatively describes the requirements and characteristics of a set of multithreaded programs and their underlying technology trends. Third, the thesis presents a systematic approach to scale and select benchmark inputs with the goal of optimizing benchmarking accuracy subject to constrained execution or simulation time. Finally, the thesis describes a parallel benchmark suite called PARSEC for evaluating modern shared-memory multiprocessors. Since its initial release, PARSEC has been adopted by many architecture groups in both research and industry.