Feedforward Neural Network Methodology
Feedforward Neural Network Methodology
Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology
Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
Towards a Parallel Data Mining Toolbox
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Compiler Techniques for Flat Neighborhood Networks
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Training of Neural Networks: Interactive Possibilities in a Distributed Framework
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Communication performance issues for two cluster computers
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
Performance enhancement of SMP clusters with multiple network interfaces using virtualization
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas such as speech and face recognition, classification of large volumes of web data and finance. The bottleneck is that neural network training involves iterative gradient descent and is extremely computationally intensive. In this paper we present a technique for distributed training of Ultra Large Scale Neural Networks (ULSNN) on Bunyip, a Linux-based cluster of 196 Pentium III processors. To illustrate ULSNN training we describe an experiment in which a neural network with 1.73 million adjustable parameters was trained to recognize machine-printed Japanese characters from a database containing 9 million training patterns. The training runs with a average performance of 163.3 Gflops/s (single precision). With a machine cost of $150,913, this yields a price/performance ratio of 92.4¢ /Mflops/s (single precision).