The PARSEC benchmark suite: characterization and architectural implications

Authors:
Christian Bienia;Sanjeev Kumar;Jaswinder Pal Singh;Kai Li
Affiliations:
Princeton University, Princeton, NJ, USA;Intel, Santa Clara, CA, USA;Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA
Venue:
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Year:
2008

Citing 12
Cited 365

Parallel algorithms for VLSI computer-aided design

Parallel algorithms for VLSI computer-aided design
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Venti: A New Approach to Archival Storage

FAST '02 Proceedings of the Conference on File and Storage Technologies
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Particle-based fluid simulation for interactive applications

Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Articulated Body Motion Capture by Stochastic Search

International Journal of Computer Vision
Automatic determination of facial muscle activations from sparse motion capture marker data

ACM SIGGRAPH 2005 Papers
Ferret: a toolkit for content-based similarity search of feature-rich data

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
ParallAX: an architecture for real-time physics

Proceedings of the 34th annual international symposium on Computer architecture
Overview of the H.264/AVC video coding standard

IEEE Transactions on Circuits and Systems for Video Technology

Serialization sets: a dynamic dependence-based parallel execution model

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting and tolerating asymmetric races

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
PARSEC: hardware profiling of emerging workloads for CMP design

Proceedings of the 23rd international conference on Supercomputing
Load balancing using work-stealing for pipeline parallelism in emerging applications

Proceedings of the 23rd international conference on Supercomputing
LiteRace: effective sampling for lightweight data-race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Hybrid cache architecture with disparate memory technologies

Proceedings of the 36th annual international symposium on Computer architecture
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
A case for an interleaving constrained shared-memory multi-processor

Proceedings of the 36th annual international symposium on Computer architecture
SigRace: signature-based data race detection

Proceedings of the 36th annual international symposium on Computer architecture
CPU, SMP and GPU implementations of Nohalo level 1, a fast co-convex antialiasing image resampler

C3S2E '09 Proceedings of the 2nd Canadian Conference on Computer Science and Software Engineering
Frequent itemset mining on graphics processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
vGreen: a system for energy efficient computing in virtualized environments

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Best of both worlds: A bus enhanced NoC (BENoC)

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Flow-aware allocation for on-chip networks

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Segment gating for static energy reduction in Networks-on-Chip

Proceedings of the 2nd International Workshop on Network on Chip Architectures
Future scaling of processor-memory interfaces

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
In-network coherence filtering: snoopy coherence without broadcasts

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A case for dynamic frequency tuning in on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Ordering decoupled metadata accesses in multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Finding concurrency bugs with context-aware communication graphs

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Operating system scheduling for efficient online self-test in robust systems

Proceedings of the 2009 International Conference on Computer-Aided Design
Load balancing on speed

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
AASH: an asymmetry-aware scheduler for hypervisors

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
CoreDet: a compiler and runtime system for deterministic multithreaded execution

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Decoupling contention management from scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors

Proceedings of the 7th ACM international conference on Computing frontiers
Applying statistical machine learning to multicore voltage & frequency scaling

Proceedings of the 7th ACM international conference on Computing frontiers
LRU-PEA: a smart replacement policy for non-uniform cache architectures on chip multiprocessors

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Cache topology aware computation mapping for multicores

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Green: a framework for supporting energy-conscious programming using controlled approximation

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Evaluating iterative optimization across 1000 datasets

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Opportunities for concurrent dynamic analysis with explicit inter-core communication

Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Quality of service profiling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments

Proceedings of the 7th international conference on Autonomic computing
The auction: optimizing banks usage in Non-Uniform Cache Architectures

Proceedings of the 24th ACM International Conference on Supercomputing
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance

Proceedings of the 24th ACM International Conference on Supercomputing
An approach to resource-aware co-scheduling for CMPs

Proceedings of the 24th ACM International Conference on Supercomputing
Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems

Proceedings of the 37th annual international symposium on Computer architecture
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications

Proceedings of the 37th annual international symposium on Computer architecture
A case for FAME: FPGA architecture model execution

Proceedings of the 37th annual international symposium on Computer architecture
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Relax: an architectural framework for software recovery of hardware faults

Proceedings of the 37th annual international symposium on Computer architecture
Performance Evaluation of a Multicore System with Optically Connected Memory Modules

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Cost-driven 3D integration with interconnect layers

Proceedings of the 47th Design Automation Conference
Virtual channels vs. multiple physical networks: a comparative analysis

Proceedings of the 47th Design Automation Conference
RAMP gold: an FPGA-based architecture simulator for multiprocessors

Proceedings of the 47th Design Automation Conference
Extensible transactional memory testbed

Journal of Parallel and Distributed Computing
A practical way to extend shared memory support beyond a motherboard at low cost

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Accelerating multicore reuse distance analysis with sampling and parallelization

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Subspace snooping: filtering snoops with operating system support

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proximity coherence for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Feedback-directed pipeline parallelism

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Scalable hardware support for conditional parallelization

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An OpenCL framework for heterogeneous multicores with local memory

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An empirical characterization of stream programs and its implications for language and compiler design

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ATAC: a 1000-core cache-coherent processor with on-chip optical network

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Scaling of the PARSEC benchmark inputs

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Distributed systems meet economics: pricing in the cloud

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Patterns and statistical analysis for understanding reduced resource computing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Inferring arbitrary distributions for data and computation

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Energy- and endurance-aware design of phase change memory caches

Proceedings of the Conference on Design, Automation and Test in Europe
Power and performance of read-write aware hybrid caches with non-volatile memories

Proceedings of the Conference on Design, Automation and Test in Europe
Balancing memory and performance through selective flushing of software code caches

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Design exploration of hybrid caches with disparate memory technologies

ACM Transactions on Architecture and Code Optimization (TACO)
A trace simplification technique for effective debugging of concurrent programs

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
On-Chip Network Evaluation Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing power and performance for reliable on-chip networks

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Efficient throughput-guarantees for latency-sensitive networks-on-chip

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A generic adaptive path-based routing method for MPSoCs

Journal of Systems Architecture: the EUROMICRO Journal
Thread criticality support in on-chip networks

Proceedings of the Third International Workshop on Network on Chip Architectures
Deterministic process groups in dOS

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Efficient system-enforced deterministic parallelism

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Towards architecture independent metrics for multicore performance analysis

ACM SIGMETRICS Performance Evaluation Review
Probabilistic accuracy bounds for perforated programs: a new foundation for program analysis and transformation

Proceedings of the 20th ACM SIGPLAN workshop on Partial evaluation and program manipulation
Tolerating Concurrency Bugs Using Transactions as Lifeguards

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Architectural Support for Fair Reader-Writer Locking

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Virtual Snooping: Filtering Snoops in Virtualized Multi-cores

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Speculative Parallelization on Commodity Clusters

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Memory Latency Reduction via Thread Throttling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
InstantCheck: Checking the Determinism of Parallel Programs Using On-the-Fly Incremental Hashing

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Concurrent Collections

Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
The future of microprocessors

Communications of the ACM
COREMU: a scalable and portable parallel full-system emulator

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
SWEEP: evaluating computer system energy efficiency using synthetic workloads

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Efficient processor support for DRFx, a memory model with exceptions

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
RCDC: a relaxed consistency deterministic computer

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Dynamic knobs for responsive power-aware computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Memory-efficient frequent-itemset mining

Proceedings of the 14th International Conference on Extending Database Technology
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches

ACM Transactions on Architecture and Code Optimization (TACO)
Parallelization libraries: Characterizing and reducing overheads

ACM Transactions on Architecture and Code Optimization (TACO)
RMS-TM: a comprehensive benchmark suite for transactional memory systems

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
RAFT: A router architecture with frequency tuning for on-chip networks

Journal of Parallel and Distributed Computing
Characterizing the impact of process variation on 45 nm NoC-based CMPs

Journal of Parallel and Distributed Computing
Array regrouping on CMP with non-uniform cache sharing

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Strategies for preparing computer science students for the multicore world

Proceedings of the 2010 ITiCSE working group reports
Run-time energy management of manycore systems through reconfigurable interconnects

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors

Journal of Parallel and Distributed Computing
LIME: a framework for debugging load imbalance in multi-threaded execution

Proceedings of the 33rd International Conference on Software Engineering
Inflation and deflation of self-adaptive applications

Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
A study of transactional memory vs. locks in practice

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Boosting parallel applications performance on applying DIM technique in a multiprocessing environment

International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
Parallelism orchestration using DoPE: the degree of parallelism executive

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic CPU-GPU communication management and optimization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Isolating and understanding concurrency errors using reconstructed execution fragments

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Synchronization via scheduling: techniques for efficiently managing shared state

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Optimizing the datacenter for data-centric workloads

Proceedings of the international conference on Supercomputing
A composite and scalable cache coherence protocol for large scale CMPs

Proceedings of the international conference on Supercomputing
Controlling cache utilization of HPC applications

Proceedings of the international conference on Supercomputing
Exploring partitioning methods for 3D Networks-on-Chip utilizing adaptive routing model

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Two-hop free-space based optical interconnects for chip multiprocessors

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
A distributed and topology-agnostic approach for on-line NoC testing

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Delay analysis of wormhole based heterogeneous NoC

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Reducing Network-on-Chip energy consumption through spatial locality speculation

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
A Comphrehensive Networks-on-Chip Simulator for Error Control Explorations

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Virtualizing performance asymmetric multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
Demand-driven software race detection using hardware performance counters

Proceedings of the 38th annual international symposium on Computer architecture
Sampling + DMR: practical and low-overhead permanent fault detection

Proceedings of the 38th annual international symposium on Computer architecture
An abacus turn model for time/space-efficient reconfigurable routing

Proceedings of the 38th annual international symposium on Computer architecture
A case for globally shared-medium on-chip interconnect

Proceedings of the 38th annual international symposium on Computer architecture
Rapid identification of architectural bottlenecks via precise event counting

Proceedings of the 38th annual international symposium on Computer architecture
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Moguls: a model to explore the memory hierarchy for bandwidth improvements

Proceedings of the 38th annual international symposium on Computer architecture
A case for heterogeneous on-chip interconnects for CMPs

Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees

Proceedings of the 38th annual international symposium on Computer architecture
Scalable power control for many-core architectures running multi-threaded applications

Proceedings of the 38th annual international symposium on Computer architecture
Considerations when evaluating microprocessor platforms

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
RADBench: a concurrency bug benchmark suite

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Deterministic OpenMP for race-free parallelism

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Parallel pattern detection for architectural improvements

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Mobile processors for energy-efficient web search

ACM Transactions on Computer Systems (TOCS)
Pruning hardware evaluation space via correlation-driven application similarity analysis

Proceedings of the 8th ACM International Conference on Computing Frontiers
A design space exploration of transmission-line links for on-chip interconnect

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
NoC frequency scaling with flexible-pipeline routers

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Device modeling and system simulation of nanophotonic on-chip networks for reliability, power and performance

Proceedings of the 48th Design Automation Conference
A helper thread based dynamic cache partitioning scheme for multithreaded applications

Proceedings of the 48th Design Automation Conference
MARSS: a full system simulator for multicore x86 CPUs

Proceedings of the 48th Design Automation Conference
Managing performance vs. accuracy trade-offs with loop perforation

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Programming heterogeneous multicore systems using threading building blocks

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Dynamic, multi-core cache coherence architecture for power-sensitive mobile processors

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimal memory controller placement for chip multiprocessor

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Probabilistically accurate program transformations

SAS'11 Proceedings of the 18th international conference on Static analysis
A read-write aware replacement policy for phase change memory

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
A semi-automatic scratchpad memory management framework for CMP

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Energy efficient many-core processor for recognition and mining using spin-based memory

NANOARCH '11 Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures
System implications of memory reliability in exascale computing

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
MAximum Multicore POwer (MAMPO): an automatic multithreaded synthetic power virus generation framework for multicore systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A study of 3D Network-on-Chip design for data parallel H.264 coding

Microprocessors & Microsystems
A minimal average accessing time scheduler for multicore processors

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Bahurupi: A polymorphic heterogeneous multi-core architecture

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
FlexSig: Implementing flexible hardware signatures

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Thread Tranquilizer: Dynamically reducing performance variation

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
VSim: Simulating multi-server setups at near native hardware speed

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A highly robust distributed fault-tolerant routing algorithm for NoCs with localized rerouting

Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
Bandwidth-aware reconfigurable cache design with hybrid memory technologies

Proceedings of the International Conference on Computer-Aided Design
Co-design of channel buffers and crossbar organizations in NoCs architectures

Proceedings of the International Conference on Computer-Aided Design
Improving System Energy Efficiency with Memory Rank Subsetting

ACM Transactions on Architecture and Code Optimization (TACO)
OpenCL as a unified programming model for heterogeneous CPU/GPU clusters

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Iterative optimization for the data center

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A case for unlimited watchpoints

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
REEact: a customizable virtual execution manager for multicore platforms

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
A systematic methodology to develop resilient cache coherence protocols

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Idempotent processor architecture

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Formally enhanced runtime verification to ensure NoC functional correctness

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Performance and power aware CMP thread allocation modeling

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Exploiting parallelism in deterministic shared memory multiprocessing

Journal of Parallel and Distributed Computing
Characteristics of workloads using the pipeline programming model

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
HPC performance domains on multi-core processors with virtualization

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Reliability-aware platform optimization for 3D chip multi-processors

The Journal of Supercomputing
Reliability-aware core partitioning in chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Improving performance of adaptive component-based dataflow middleware

Parallel Computing
Transformer: a functional-driven cycle-accurate multicore simulator

Proceedings of the 49th Annual Design Automation Conference
Exploration of heuristic scheduling algorithms for 3D multicore processors

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Boosting single thread performance in mobile processors via reconfigurable acceleration

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Reconfigurable multicore architecture for dynamic processor reallocation

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Trace-driven simulation of memory system scheduling in multithread application

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Parcae: a system for flexible parallel execution

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Speculative separation for privatization and reductions

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Static analysis and compiler design for idempotent processing

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Effective parallelization of loops in the presence of I/O operations

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Automatic speculative DOALL for clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Dynamically managed data for CPU-GPU architectures

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitration

Proceedings of the 26th ACM international conference on Supercomputing
Hardware support for enforcing isolation in lock-based parallel programs

Proceedings of the 26th ACM international conference on Supercomputing
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters

Proceedings of the 26th ACM international conference on Supercomputing
Minimizing the Data Transfer Time Using Multicore End-System Aware Flow Bifurcation

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Brief announcement: the problem based benchmark suite

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Waveperf: a benchmark generator for performance evaluation

ACM SIGBED Review - 2nd Workshop on Embed With Linux (EWiLi 2012)
Power Limitations and Dark Silicon Challenge the Future of Multicore

ACM Transactions on Computer Systems (TOCS)
Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Enhancing effective throughput for transmission line-based bus

Proceedings of the 39th Annual International Symposium on Computer Architecture
RADISH: always-on sound and complete Ra Detection in Software and Hardware

Proceedings of the 39th Annual International Symposium on Computer Architecture
Euripus: a flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability

Proceedings of the 39th Annual International Symposium on Computer Architecture
A defect-tolerant accelerator for emerging high-performance applications

Proceedings of the 39th Annual International Symposium on Computer Architecture
Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Proceedings of the 39th Annual International Symposium on Computer Architecture
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
A new degree of freedom for memory allocation in clusters

Cluster Computing
Do we need a crystal ball for task migration?

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
A template library to integrate thread scheduling and locality management for NUMA multiprocessors

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Parakeet: a just-in-time parallel accelerator for python

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
An OpenMP 3.1 validation testsuite

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Optimizing heterogeneous NoC design

Proceedings of the International Workshop on System Level Interconnect Prediction
Thread vulnerability in parallel applications

Journal of Parallel and Distributed Computing
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Deconstructing iterative optimization

ACM Transactions on Architecture and Code Optimization (TACO)
Memory optimization of dynamic binary translators for embedded systems

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient implementation of globally-aware network flow control

Journal of Parallel and Distributed Computing
Power-aware multi-core simulation for early design stage hardware/software co-optimization

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
APCR: an adaptive physical channel regulator for on-chip interconnects

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Complexity-effective multicore coherence

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
HaLock: hardware-assisted lock contention detection in multithreaded applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
ViPZonE: OS-level memory variability-driven physical address zoning for energy savings

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A novel NoC-based design for fault-tolerance of last-level caches in CMPs

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

ACM Transactions on Computer Systems (TOCS)
Comparison of Decision-Making Strategies for Self-Optimization in Autonomic Computing Systems

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special Section: Extended Version of SASO 2011 Best Paper
Efficiently combining parallel software using fine-grained, language-level, hierarchical resource management policies

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Case studies of multi-core energy efficiency in task based programs

ICT-GLOW'12 Proceedings of the Second international conference on ICT as Key Technology against Global Warming
Power challenges may end the multicore era

Communications of the ACM
Exploring object-level parallelism on chip multi-processors

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Verified integrity properties for safe approximate program transformations

PEPM '13 Proceedings of the ACM SIGPLAN 2013 workshop on Partial evaluation and program manipulation
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

ACM Transactions on Computer Systems (TOCS)
Improving last level cache locality by integrating loop and data transformations

Proceedings of the International Conference on Computer-Aided Design
Functional post-silicon diagnosis and debug for networks-on-chip

Proceedings of the International Conference on Computer-Aided Design
Scalable deterministic replay in a parallel full-system emulator

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

ACM Transactions on Architecture and Code Optimization (TACO)
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic generation of program affinity policies using machine learning

CC'13 Proceedings of the 22nd international conference on Compiler Construction
Paragon: QoS-aware scheduling for heterogeneous datacenters

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
HOTL: a higher order theory of locality

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Demand-based coordinated scheduling for SMP VMs

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Predicting Coherence Communication by Tracking Synchronization Points at Run Time

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems

MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops
Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores

MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops
CONCURRIT: a domain specific language for reproducing concurrency bugs

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Conversion: multi-version concurrency control for main memory segments

Proceedings of the 8th ACM European Conference on Computer Systems
Safety-first approach to memory consistency models

Proceedings of the 2013 international symposium on memory management
High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policy

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Cherry-picking: exploiting process variations in dark-silicon homogeneous chip multi-processors

Proceedings of the Conference on Design, Automation and Test in Europe
Efficient software-based fault tolerance approach on multicore platforms

Proceedings of the Conference on Design, Automation and Test in Europe
Proactive aging management in heterogeneous NoCs through a criticality-driven routing approach

Proceedings of the Conference on Design, Automation and Test in Europe
Contrasting wavelength-routed optical NoC topologies for power-efficient 3D-stacked multicore processors using physical-layer analysis

Proceedings of the Conference on Design, Automation and Test in Europe
Modeling and analysis of fault-tolerant distributed memories for networks-on-chip

Proceedings of the Conference on Design, Automation and Test in Europe
Exploring memory consistency for massively-threaded throughput-oriented processors

Proceedings of the 40th Annual International Symposium on Computer Architecture
Reducing memory access latency with asymmetric DRAM bank organizations

Proceedings of the 40th Annual International Symposium on Computer Architecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Studying multicore processor scaling via reuse distance analysis

Proceedings of the 40th Annual International Symposium on Computer Architecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior

Proceedings of the 40th Annual International Symposium on Computer Architecture
The locality-aware adaptive cache coherence protocol

Proceedings of the 40th Annual International Symposium on Computer Architecture
A new perspective for efficient virtual-cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
On-the-fly pipeline parallelism

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Proactive circuit allocation in multiplane NoCs

Proceedings of the 50th Annual Design Automation Conference
RISO: relaxed network-on-chip isolation for cloud processors

Proceedings of the 50th Annual Design Automation Conference
Dynamic voltage and frequency scaling for shared resources in multicore processor designs

Proceedings of the 50th Annual Design Automation Conference
HaDeS: architectural synthesis for heterogeneous dark silicon chip multi-processors

Proceedings of the 50th Annual Design Automation Conference
Hierarchical power management for asymmetric multi-core in dark silicon era

Proceedings of the 50th Annual Design Automation Conference
Co-tuning of a hybrid electronic-optical network for reducing energy consumption in embedded CMPs

Proceedings of the First International Workshop on Many-core Embedded Systems
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Modeling and design exploration of FBDRAM as on-chip memory

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Schedule processes, not VCPUs

Proceedings of the 4th Asia-Pacific Workshop on Systems
Ordering circuit establishment in multiplane NoCs

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Location-aware cache management for many-core processors with deep cache hierarchy

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Guide-copy: fast and silent migration of virtual machine for datacenters

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
In-network monitoring and control policy for DVFS of CMP networks-on-chip and last level caches

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Deflection routing in 3D network-on-chip with limited vertical bandwidth

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Distributed fair DRAM scheduling in network-on-chips architecture

Journal of Systems Architecture: the EUROMICRO Journal
Optimal placement of vertical connections in 3D Network-on-Chip

Journal of Systems Architecture: the EUROMICRO Journal
ForEVeR: A complementary formal and runtime verification approach to correct NoC functionality

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
ThermOS: system support for dynamic thermal management of chip multi-processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
SMT-centric power-aware thread placement in chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Fairness-aware scheduling on single-ISA heterogeneous multi-cores

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
An empirical model for predicting cross-core performance interference on multicore processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Racing and pacing to idle: an evaluation of heuristics for energy-aware resource allocation

Proceedings of the Workshop on Power-Aware Computing and Systems
Threadguide: profiler assisted application adaptation on CMP

Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop
Dynamic thread pinning for phase-based OpenMP programs

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Flexible filters in stream programs

ACM Transactions on Embedded Computing Systems (TECS)
Use it or lose it: wear-out and lifetime in future chip multiprocessors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
uDIREC: unified diagnosis and reconfiguration for frugal bypass of NoC faults

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Imbalanced cache partitioning for balanced data-parallel programs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Large-reach memory management unit caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A circuit-architecture co-optimization framework for exploring nonvolatile memory hierarchies

ACM Transactions on Architecture and Code Optimization (TACO)
Modeling the impact of permanent faults in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Quasar: resource-efficient and QoS-aware cluster management

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
REF: resource elasticity fairness with sharing incentives for multiprocessors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Deterministic galois: on-demand, portable and parameterless

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Price theory based power management for heterogeneous multi-cores

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Post-compiler software optimization for reducing energy

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Estimating the Empirical Cost Function of Routines with Dynamic Workloads

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic and Adaptive Calling Context Encoding

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
On self-tuning networks-on-chip for dynamic network-flow dominance adaptation

ACM Transactions on Embedded Computing Systems (TECS) - Special Section ESFH'12, ESTIMedia'11 and Regular Papers
Efficient deterministic multithreading without global barriers

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Selecting representative benchmark inputs for exploring microprocessor design spaces

ACM Transactions on Architecture and Code Optimization (TACO)
ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity

ACM Transactions on Architecture and Code Optimization (TACO)
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling

ACM Transactions on Architecture and Code Optimization (TACO)
Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
A generalized software framework for accurate and efficient management of performance goals

Proceedings of the Eleventh ACM International Conference on Embedded Software
QoS-Aware scheduling in heterogeneous datacenters with paragon

ACM Transactions on Computer Systems (TOCS)
Performance implications of non-uniform VCPU-PCPU mapping in virtualization environment

Cluster Computing
The case of using multiple streams in streaming

International Journal of Automation and Computing
Improving platform energy: chip area trade-off in near-threshold computing environment

Proceedings of the International Conference on Computer-Aided Design
Thread-criticality aware dynamic cache reconfiguration in multi-core system

Proceedings of the International Conference on Computer-Aided Design
Dual partitioning multicasting for high-performance on-chip networks

Journal of Parallel and Distributed Computing
Direct distributed memory access for CMPs

Journal of Parallel and Distributed Computing
Exploiting replication to improve performances of NUCA-based CMP systems

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
The effect of communication and synchronization on Amdahl's law in multicore systems

Parallel Computing
Ultra-low-power adder stage design for exascale floating point units

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
PAIS: Parallelism-aware interconnect scheduling in multicores

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Post-silicon platform for the functional diagnosis and debug of networks-on-chip

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
NoC-based fault-tolerant cache design in chip multiprocessors

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
METEOR: Hybrid photonic ring-mesh network-on-chip for multicore architectures

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
BCIBench: a benchmarking suite for EEG-based brain computer interface

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Virtual asymmetric multiprocessor for interactive performance of consolidated desktops

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Power Modeling for Heterogeneous Processors

Proceedings of Workshop on General Purpose Processing Using GPUs
HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap

ACM Transactions on Architecture and Code Optimization (TACO)
Endurance-aware cache line management for non-volatile caches

ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
DP&TB: a coherence filtering protocol for many-core chip multiprocessors

The Journal of Supercomputing
Eliminating unscalable communication in transaction processing

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.04

Visualization

Abstract

This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.