Patch clustering for massive data sets

Authors:
Nikolai Alex;Alexander Hasenfuss;Barbara Hammer
Affiliations:
University of Applied Science Braunschweig/Wolfenbüttel, Department of Computer Science, Salzdahlumer Str. 46/48, 38302 Wolfenbüttel, Germany;Clausthal University of Technology, Department of Informatics, Julius-Albert-Str. 4, 38678 Clausthal-Zellerfeld, Germany;Clausthal University of Technology, Department of Informatics, Julius-Albert-Str. 4, 38678 Clausthal-Zellerfeld, Germany
Venue:
Neurocomputing
Year:
2009

Citing 21
Cited 10

Topology representing networks

Neural Networks
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Adaptive resonance theory (ART)

The handbook of brain theory and neural networks
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Parallel Implementation of Self-Organizing Map on the Partial Tree Shape Neurocomputer

Neural Processing Letters
Clustering Algorithms

Clustering Algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
How to make large self-organizing maps for nonvectorial data

Neural Networks - New developments in self-organizing maps
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Stability and generalization

The Journal of Machine Learning Research
Parallelizing Clustering of Geoscientific Data Sets using Data Streams

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures

Neural Computation
Single-pass clustering for peer-to-peer information retrieval: the effect of document ordering

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Fast and exact out-of-core and distributed k-means clustering

Knowledge and Information Systems
Batch and median neural gas

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Supervised batch neural gas

ANNPR'06 Proceedings of the Second international conference on Artificial Neural Networks in Pattern Recognition
Self-organizing maps, vector quantization, and mixture modeling

IEEE Transactions on Neural Networks
`Neural-gas' network for vector quantization and its application to time-series prediction

IEEE Transactions on Neural Networks

Median fuzzy c-means for clustering dissimilarity data

Neurocomputing
Topographic mapping of large dissimilarity data sets

Neural Computation
Divergence-based vector quantization

Neural Computation
Topographic mapping of dissimilarity data

WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
Linear time heuristics for topographic mapping of dissimilarity data

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Prototype-based classification of dissimilarity data

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Clustering very large dissimilarity data sets

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Relational extensions of learning vector quantization

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Approximation techniques for clustering dissimilarity data

Neurocomputing
Patch processing for relational learning vector quantization

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I

Quantified Score

Hi-index	0.02

Visualization

Abstract

The presence of huge data sets poses new problems to popular clustering and visualization algorithms such as neural gas (NG) and the self-organising-map (SOM) due to memory and time constraints. In such situations, it is no longer possible to store all data points in the main memory at once and only a few, ideally only one run over the whole data set is still affordable to achieve a feasible training time. In this contribution we propose single pass extensions of the classical clustering algorithms NG and SOM which are based on a simple patch decomposition of the data set and fast batch optimization schemes of the underlying cost function. The algorithms only require a fixed memory space. They maintain the benefits of the original ones including easy implementation and interpretation as well as large flexibility and adaptability. We demonstrate that parallelization of the methods becomes easily possible and we show the efficiency of the approach in a variety of experiments.