Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

Authors:
Cheong Ghil Kim;Jeom Goo Kim;Do Hyeon Lee
Affiliations:
Department of Computer Science, Namseoul University, Cheonan-city, Korea 331-707;Department of Computer Science, Namseoul University, Cheonan-city, Korea 331-707;IT Convergence Technology Research & Education Center, Namseoul University, Cheonan-city, Korea 331-707
Venue:
Multimedia Tools and Applications
Year:
2014

Citing 14
Cited 0

Reconfigurable pipelined 2-D convolvers for fast digital signal processing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Digital Image Processing

Digital Image Processing
MMX Technology Extension to the Intel Architecture

IEEE Micro
AMD 3DNow! Technology: Architecture and Implementations

IEEE Micro
AltiVec Extension to PowerPC Accelerates Media Processing

IEEE Micro
Using Intel Streaming SIMD Extensions for 3D Geometry Processing

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
The visual instruction set (VIS) in UltraSPARC

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
QUAFF: efficient C++ design for parallel skeletons

Parallel Computing - Algorithmic skeletons
Efficient implementation of sorting on multi-core SIMD CPU architecture

Proceedings of the VLDB Endowment
Intel threading building blocks

Intel threading building blocks
A Light-weight API for Portable Multicore Programming

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Multicore Desktop Programming with Intel Threading Building Blocks

IEEE Software
Multi-Target vectorization with MTPS c++ generic library

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid advance of computer hardware and popularity of multimedia applications enable multi-core processors with sub-word parallelism instructions to become a dominant market trend in desk-top PCs as well as high end mobile devices. This paper presents an efficient parallel implementation of 2D convolution algorithm demanding high performance computing power in multi-core desktop PCs. It is a representative computation intensive algorithm, in image and signal processing applications, accompanied by heavy memory access; on the other hand, their computational complexities are relatively low. The purpose of this study is to explore the effectiveness of exploiting the streaming SIMD (Single Instruction Multiple Data) extension (SSE) technology and TBB (Threading Building Block) run-time library in Intel multi-core processors. By doing so, we can take advantage of all the hardware features of multi-core processor concurrently for data- and task-level parallelism. For the performance evaluation, we implemented a 3驴脳驴3 kernel based convolution algorithm using SSE2 and TBB with different combinations and compared their processing speeds. The experimental results show that both technologies have a significant effect on the performance and the processing speed can be greatly improved when using two technologies at the same time; for example, 6.2, 6.1, and 1.4 times speedup compared with the implementation of either of them are suggested for 256驴脳驴256, 512驴脳驴512, and 1024驴脳驴1024 data sets, respectively.