A highly efficient implementation of a backpropagation learning algorithm using matrix ISA

Authors:
Mostafa I. Soliman;Samir A. Mohamed
Affiliations:
Computer & Control Section, Electrical Engineering Department, Faculty of Engineering, South Valley University, Aswan, Egypt;Computer & Control Section, Electrical Engineering Department, Faculty of Engineering, South Valley University, Aswan, Egypt
Venue:
Journal of Parallel and Distributed Computing
Year:
2008

Citing 21
Cited 0

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Scientific computing on vector computers

Scientific computing on vector computers
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Computer Technology and Architecture: An Evolving Interaction

Computer
Code optimizers and register organizations for vector architectures

Code optimizers and register organizations for vector architectures
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Decoupled access/execute computer architectures

ACM Transactions on Computer Systems (TOCS)
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
The μVP 64-Bit Vector Coprocessor: A New Implementation of High-Performance Numerical Computation

IEEE Micro
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Complexity-effective superscalar processors

Complexity-effective superscalar processors
Vector microprocessors

Vector microprocessors
Scalable vector media-processors for embedded systems

Scalable vector media-processors for embedded systems
Sourcebook of parallel computing

Sourcebook of parallel computing
Matrix bidiagonalization: implementation and evaluation on the Trident processor

Neural, Parallel & Scientific Computations
What can we gain by unfolding loops?

ACM SIGPLAN Notices
Parallel Implementation of Back-Propagation Algorithm in Networks of Workstations

IEEE Transactions on Parallel and Distributed Systems
The separability theory of hyperbolic tangent kernels and support vector machines for pattern classification

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
A Benchmark Comparison of Three Supercomputers: Fujitsu VP-200, Hitachi S810/120, and Cray X-MP/2

Computer

Quantified Score

Hi-index	0.00

Visualization

Abstract

BackPropagation (BP) is the most famous learning algorithm for Artificial Neural Networks (ANN). BP has received intensive research efforts to exploit its parallelism in order to reduce the training time for complex problems. A modified version of BP based on matrix-matrix multiplication was proposed for parallel processing. In this paper, we present the implementation of Matrix BackPropagation (MBP) using scalar, vector, and matrix Instruction Set Architectures (ISAs). Besides this, we show that the performance of the MBP is improved by switching from scalar ISA to vector ISA. It is further improved by switching from vector ISA to matrix ISA. On a practical application, speech recognition, the speedup of training a neural network using unrolling scalar ISA over scalar ISA is 1.83. On eight parallel lanes, the speedups of using vector, unrolling vector, and matrix ISAs are respectively 10.33, 11.88, and 15.36, where the maximum theoretical speedup is 16. The results obtained show that the use of matrix ISA gives a performance close to optimal, because of reusing the loaded data, decreasing the loop overhead, and overlapping the memory operations with arithmetic operations.