Topology representing networks
Neural Networks
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Adaptive resonance theory (ART)
The handbook of brain theory and neural networks
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter
Parallel Implementation of Self-Organizing Map on the Partial Tree Shape Neurocomputer
Neural Processing Letters
Clustering Algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
How to make large self-organizing maps for nonvectorial data
Neural Networks - New developments in self-organizing maps
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
The Journal of Machine Learning Research
Parallelizing Clustering of Geoscientific Data Sets using Data Streams
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures
Neural Computation
Single-pass clustering for peer-to-peer information retrieval: the effect of document ordering
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Fast and exact out-of-core and distributed k-means clustering
Knowledge and Information Systems
Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
ANNPR'06 Proceedings of the Second international conference on Artificial Neural Networks in Pattern Recognition
Self-organizing maps, vector quantization, and mixture modeling
IEEE Transactions on Neural Networks
`Neural-gas' network for vector quantization and its application to time-series prediction
IEEE Transactions on Neural Networks
Median fuzzy c-means for clustering dissimilarity data
Neurocomputing
Topographic mapping of large dissimilarity data sets
Neural Computation
Divergence-based vector quantization
Neural Computation
Topographic mapping of dissimilarity data
WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
Linear time heuristics for topographic mapping of dissimilarity data
IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Prototype-based classification of dissimilarity data
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Clustering very large dissimilarity data sets
ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Relational extensions of learning vector quantization
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Approximation techniques for clustering dissimilarity data
Neurocomputing
Patch processing for relational learning vector quantization
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Hi-index | 0.02 |
The presence of huge data sets poses new problems to popular clustering and visualization algorithms such as neural gas (NG) and the self-organising-map (SOM) due to memory and time constraints. In such situations, it is no longer possible to store all data points in the main memory at once and only a few, ideally only one run over the whole data set is still affordable to achieve a feasible training time. In this contribution we propose single pass extensions of the classical clustering algorithms NG and SOM which are based on a simple patch decomposition of the data set and fast batch optimization schemes of the underlying cost function. The algorithms only require a fixed memory space. They maintain the benefits of the original ones including easy implementation and interpretation as well as large flexibility and adaptability. We demonstrate that parallelization of the methods becomes easily possible and we show the efficiency of the approach in a variety of experiments.