An Approximate Minimum Degree Ordering Algorithm
SIAM Journal on Matrix Analysis and Applications
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
Performance Optimization and Modeling of Blocked Sparse Kernels
International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression
Proceedings of the 5th conference on Computing frontiers
Reordering Algorithms for Increasing Locality on Multicore Processors
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Proceedings of the 23rd international conference on Supercomputing
A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures
CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 01
Performance evaluation of the sparse matrix-vector multiplication on modern architectures
The Journal of Supercomputing
Light-weight communications on Intel's single-chip cloud computer processor
ACM SIGOPS Operating Systems Review
Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system
The Journal of Supercomputing
The university of Florida sparse matrix collection
ACM Transactions on Mathematical Software (TOMS)
Memory-Intensive Applications on a Many-Core Processor
HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
Performance Analysis and Benchmarking of the Intel SCC
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Fast sparse matrix-vector multiplication by exploiting variable block structure
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
Microprocessors & Microsystems
Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Hi-index | 0.00 |
The microprocessor industry has responded to memory, power and ILP walls by turning to many-core processors, increasing parallelism as the primary method to improve processor performance. These processors are expected to consist of tens or even hundreds of cores. One of these future processors is the 48-core experimental processor Single-Chip Cloud Computer (SCC). The SCC was created by Intel Labs as a platform for many-core software research. In this work we study the behavior of an important irregular application such as the Sparse Matrix-Vector multiplication (SpMV) on the SCC processor in terms of performance and power efficiency. In addition, some of the most successful optimization techniques for this kernel are evaluated. In particular, reordering, blocking and data compression techniques have been considered. Our experiments give some key insights that can serve as guidelines for the understanding and optimization of the SpMV kernel on this architecture. Furthermore, an architectural comparison of the SCC processor with several leading multicore processors and GPUs is performed, including the new Intel Xeon Phi coprocessor. The SCC only outperforms the Itanium2 multicore processor. Best performance results are observed for the high-end GPUs and the Phi, while reaching low values with respect to their peak performance. In terms of power efficiency, we must highlight the good behavior of the ATI GPUs.