A Scalable Parallel Algorithm for Self-Organizing Maps with Applicationsto Sparse Data Mining Problems

Authors:
R. D. Lawrence;G. S. Almasi;H. E. Rushmeier
Affiliations:
IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598. lawrence@watson.ibm.com;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598. almasi@watson.ibm.com;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598. holly@watson.ibm.com
Venue:
Data Mining and Knowledge Discovery
Year:
1999

Citing 8
Cited 10

The 'Neural' Phonetic Typewriter

Computer
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
Self-organization as an iterative kernel smoothing process

Neural Computation
Visualizing customer segmentations produce by self organizing maps (case study)

VIS '97 Proceedings of the 8th conference on Visualization '97
Self-Organizing Maps

Self-Organizing Maps
Reality Check for Data Mining

IEEE Expert: Intelligent Systems and Their Applications
Very Large Two-Level SOM for the Browsing of Newsgroups

ICANN 96 Proceedings of the 1996 International Conference on Artificial Neural Networks
Modified self-organizing feature map algorithms for efficient digital hardware implementation

IEEE Transactions on Neural Networks

Personalization of Supermarket Product Recommendations

Data Mining and Knowledge Discovery
Expanding self-organizing map for data visualization and cluster analysis

Information Sciences: an International Journal - Special issue: Soft computing data mining
Using Octave to introduce programming to technical science students

ITiCSE '05 Proceedings of the 10th annual SIGCSE conference on Innovation and technology in computer science education
Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Performance improvements of a Kohonen self organizing classification algorithm on sparse data sets

MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
A clustering study of a 7000 EU document inventory using MDS and SOM

Expert Systems with Applications: An International Journal
Mining massive datasets by an unsupervised parallel clustering on a GRID: Novel algorithms and case study

Future Generation Computer Systems
Text clustering based on LSA-HGSOM

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Unsupervised learning in information retrieval using NOW architectures

EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Scalable data clustering: a sammon's projection based technique for merging GSOMs

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a scalable parallel implementation of the self organizing map (SOM) suitable for data-mining applications involving clustering or segmentation against large data sets such as those encountered in the analysis of customer spending patterns.The parallel algorithm is based on the batch SOM formulation in which the neural weights are updated at the end of each pass over the trainingdata. The underlying serial algorithm is enhanced to take advantage of the sparseness often encountered in these data sets. Analysis of a realistic test problem shows that the batch SOM algorithm captures key features observed using the conventional on-line algorithm,with comparable convergence rates.Performance measurements on an SP2 parallel computer are given for two retail data sets and a publicly available set of census data.These results demonstrate essentially linear speedup for the parallel batch SOM algorithm, using both a memory-contained sparse formulation as well as a separate implementation in which the mining data is accessed directly from a parallel file system. We also present visualizationsof the census data to illustrate the value of the clustering informationobtained via the parallel SOM method.