A study of performance scalability by parallelizing loop iterations on multi-core SMPs

Authors:
Prakash Raghavendra;Akshay Kumar Behki;K. Hariprasad;Madhav Mohan;Praveen Jain;Srivatsa S. Bhat;V. M. Thejus;Vishnumurthy Prabhu
Affiliations:
Department of Information Technology, National Institute of Technology Karnataka, Surathkal;Department of Information Technology, National Institute of Technology Karnataka, Surathkal;Department of Information Technology, National Institute of Technology Karnataka, Surathkal;Department of Information Technology, National Institute of Technology Karnataka, Surathkal;Department of Information Technology, National Institute of Technology Karnataka, Surathkal;Department of Information Technology, National Institute of Technology Karnataka, Surathkal;Department of Information Technology, National Institute of Technology Karnataka, Surathkal;Department of Information Technology, National Institute of Technology Karnataka, Surathkal
Venue:
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Year:
2010

Citing 7
Cited 0

Loop Parallelization

Loop Parallelization
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Enabling unimodular transformations

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
Hyperplane Partitioning: An Approach to Global Data Partitioning for Distributed Memory Machines

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Communication Cost Estimation and Global Data Partitioning for Distributed Memory Machines

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, the challenge is to exploit the parallelism available in the way of multi-core architectures by the software This could be done by re-writing the application, by exploiting the hardware capabilities or expect the compiler/software runtime tools to do the job for us With the advent of multi-core architectures ([1] [2]), this problem is becoming more and more relevant Even today, there are not many run-time tools to analyze the behavioral pattern of such performance critical applications, and to re-compile them So, techniques like OpenMP for shared memory programs are still useful in exploiting parallelism in the machine This work tries to study if the loop parallelization (both with and without applying transformations) can be a good case for running scientific programs efficiently on such multi-core architectures We have found the results to be encouraging and we strongly feel that this could lead to some good results if implemented fully in a production compiler for multi-core architectures.