Analyzing memory access intensity in parallel programs on multicore

Authors:
Lixia Liu;Zhiyuan Li;Ahmed H. Sameh
Affiliations:
Purdue University, West Lafayette, USA;Purdue University, West Lafayette, USA;Purdue University, West Lafayette, USA
Venue:
Proceedings of the 22nd annual international conference on Supercomputing
Year:
2008

Citing 13
Cited 7

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Limitations of cache prefetching on a bus-based multiprocessor

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
ScaLAPACK user's guide

ScaLAPACK user's guide
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Cache Memories

ACM Computing Surveys (CSUR)
Implementation in ScaLAPACK of Divide-and-Conquer Algorithms forBanded and Tridiagonal Linear Systems

Implementation in ScaLAPACK of Divide-and-Conquer Algorithms forBanded and Tridiagonal Linear Systems
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
A parallel hybrid banded system solver: the SPIKE algorithm

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Parallelization and Performance Analysis of Video Feature Extractions on Multi-Core Based Systems

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture

Improving parallelism and locality with asynchronous algorithms

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Overview of Multicore Requirements towards Real-Time Communication

SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
A compiler-automated array compression scheme for optimizing memory intensive programs

Proceedings of the 24th ACM International Conference on Supercomputing
Tackling cache-line stealing effects using run-time adaptation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Performance models for the Spike banded linear system solver

Scientific Programming
Capacity metric for chip heterogeneous multiprocessors

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Can manycores support the memory requirements of scientific applications?

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the shared memory bus becomes a major performance bottleneck for many numerical applications on multicore chips, understanding how the increased parallelism on chip strains the memory bandwidth and hence affects the efficiency of parallel codes becomes a critical issue. This paper introduces the notion of memory access intensity to facilitate quantitative analysis of program's memory behavior on multicores which employ state-of-the-art prefetching hardware. Three numerical solvers for large scale sparse linear systems are used to demonstrate the estimation of memory access intensity and its effect on program performance.