Clustering very large dissimilarity data sets

Authors:
Barbara Hammer;Alexander Hasenfuss
Affiliations:
CITEC, University of Bielefeld, Germany;Department of Computer Science, Clausthal University of Technology, Germany
Venue:
ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Year:
2010

Citing 27
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A stochastic self-organizing map for proximity data

Neural Computation
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Self-Organizing Maps

Self-Organizing Maps
Spectral Partitioning with Indefinite Kernels Using the Nyström Extension

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
How to make large self-organizing maps for nonvectorial data

Neural Networks - New developments in self-organizing maps
A taxonomy for spatiotemporal connectionist networks revisited: the unsupervised case

Neural Computation
Optimal Cluster Preserving Embedding of Nonmetric Proximity Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
A generative probabilistic approach to visualizing sets of symbolic sequences

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Visual Analytics

IEEE Computer Graphics and Applications
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Recursive self-organizing network models

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Self-organizing maps and clustering methods for matrix data

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Approximate data mining in very large relational data

ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Batch and median neural gas

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
On the equivalence between kernel self-organising maps and self-organising mixture density networks

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
The Dissimilarity Representation for Pattern Recognition: Foundations And Applications (Machine Perception and Artificial Intelligence)

The Dissimilarity Representation for Pattern Recognition: Foundations And Applications (Machine Perception and Artificial Intelligence)
On the information and representation of non-Euclidean pairwise data

Pattern Recognition
Patch clustering for massive data sets

Neurocomputing
Handbook of Statistical Analysis and Data Mining Applications

Handbook of Statistical Analysis and Data Mining Applications
Graph-Based Representation of Symbolic Musical Data

GbRPR '09 Proceedings of the 7th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition
Clustering by compression

IEEE Transactions on Information Theory
Self-organizing maps, vector quantization, and mixture modeling

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering and visualization constitute key issues in computer-supported data inspection, and a variety of promising tools exist for such tasks such as the self-organizing map (SOM) and variations thereof. Real life data, however, pose severe problems to standard data inspection: on the one hand, data are often represented by complex non-vectorial objects and standard methods for finite dimensional vectors in Euclidean space cannot be applied. On the other hand, very large data sets have to be dealt with, such that data do neither fit into main memory, nor more than one pass over the data is still affordable, i.e. standard methods can simply not be applied due to the sheer amount of data. We present two recent extensions of topographic mappings: relational clustering, which can deal with general proximity data given by pairwise distances, and patch processing, which can process streaming data of arbitrary size in patches. Together, an efficient linear time data inspection method for general dissimilarity data structures results. We present the theoretical background as well as applications to the areas of text and multimedia processing based on the generalized compression distance.