Modifications of the fuzzy-artmap algorithm for distributed learning in large data sets

  • Authors:
  • Jose Castro;Michael Georgiopoulos

  • Affiliations:
  • University of Central Florida;University of Central Florida

  • Venue:
  • Modifications of the fuzzy-artmap algorithm for distributed learning in large data sets
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Fuzzy-ARTMAP (FAM) algorithm has been proven to be one of the premier neural network architectures for classification problems. FAM can learn on line and is usually faster than other neural network approaches. Nevertheless the learning time of FAM can slow down considerably when the size of the training set increases into the hundreds of thousands. In this dissertation we apply data partitioning and network partitioning to the FAM algorithm in a sequential and parallel setting to achieve better convergence time and to efficiently train with large databases (hundreds of thousands of patterns). We implement our parallelization on a BEOWULF clusters of workstations. This choice of platform requires that the process of parallelization be coarse grained. Extensive testing of all the approaches is done on three large datasets (half a million data points). One of them is the Forest Covertype database from Blackard and the other two are artificially generated Gaussian data with different percentages of overlap between classes. Speedups in the data partitioning approach reached the order of the hundreds without having to invest in parallel computation. Speedups on the network partitioning approach are close to linear on a cluster of workstations. Both methods allowed us to reduce the computation time of training the neural network in large databases from days to minutes. We prove formally that the workload balance of our network partitioning approaches will never be worse than an acceptable bound, and also demonstrate the correctness of these parallelization variants of FAM.