Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor

Authors:
Juan C. Pichel;Francisco F. Rivera
Affiliations:
-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 19
Cited 0

An Approximate Minimum Degree Ordering Algorithm

SIAM Journal on Matrix Analysis and Applications
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

SIAM Review
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Performance Optimization and Modeling of Blocked Sparse Kernels

International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Reordering Algorithms for Increasing Locality on Multicore Processors

HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 01
Performance evaluation of the sparse matrix-vector multiplication on modern architectures

The Journal of Supercomputing
Light-weight communications on Intel's single-chip cloud computer processor

ACM SIGOPS Operating Systems Review
Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system

The Journal of Supercomputing
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)
Memory-Intensive Applications on a Many-Core Processor

HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
Performance Analysis and Benchmarking of the Intel SCC

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs

Microprocessors & Microsystems
Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

The microprocessor industry has responded to memory, power and ILP walls by turning to many-core processors, increasing parallelism as the primary method to improve processor performance. These processors are expected to consist of tens or even hundreds of cores. One of these future processors is the 48-core experimental processor Single-Chip Cloud Computer (SCC). The SCC was created by Intel Labs as a platform for many-core software research. In this work we study the behavior of an important irregular application such as the Sparse Matrix-Vector multiplication (SpMV) on the SCC processor in terms of performance and power efficiency. In addition, some of the most successful optimization techniques for this kernel are evaluated. In particular, reordering, blocking and data compression techniques have been considered. Our experiments give some key insights that can serve as guidelines for the understanding and optimization of the SpMV kernel on this architecture. Furthermore, an architectural comparison of the SCC processor with several leading multicore processors and GPUs is performed, including the new Intel Xeon Phi coprocessor. The SCC only outperforms the Itanium2 multicore processor. Best performance results are observed for the high-end GPUs and the Phi, while reaching low values with respect to their peak performance. In terms of power efficiency, we must highlight the good behavior of the ATI GPUs.