On the Quantization Error in SOM vs. VQ: A Critical and Systematic Study

Authors:
Teuvo Kohonen;Ilari T. Nieminen;Timo Honkela
Affiliations:
Helsinki University of Technology, Centre of Adaptive Informatics, HUT, Finland 02015;Helsinki University of Technology, Centre of Adaptive Informatics, HUT, Finland 02015;Helsinki University of Technology, Centre of Adaptive Informatics, HUT, Finland 02015
Venue:
WSOM '09 Proceedings of the 7th International Workshop on Advances in Self-Organizing Maps
Year:
2009

Citing 6
Cited 3

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Self-Organizing Maps

Self-Organizing Maps
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
An Algorithm for Finding Intrinsic Dimensionality of Data

IEEE Transactions on Computers
Self-organizing maps as substitutes for k-means clustering

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part III
Asymptotic level density in topological feature maps

IEEE Transactions on Neural Networks

Using correlation dimension for analysing text data

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part I
Contextually self-organized maps of chinese words

WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
3D object modeling with graphics hardware acceleration and unsupervised neural networks

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The self-organizing map (SOM) is related to the classical vector quantization (VQ). Like in the VQ, the SOM represents a distribution of input data vectors using a finite set of models. In both methods, the quantization error (QE) of an input vector can be expressed, e.g., as the Euclidean norm of the difference of the input vector and the best-matching model. Since the models are usually optimized in the VQ so that the sum of the squared QEs is minimized for the given set of training vectors, a common notion is that it will be impossible to find models that produce a smaller rms QE. Therefore it has come as a surprise that in some cases the rms QE of a SOM can be smaller than that of a VQ with the same number of models and the same input data. This effect may manifest itself if the number of training vectors per model is on the order of small integers and the testing is made with an independent set of test vectors. An explanation seems to ensue from statistics. Each model vector in the VQ is determined as the average of those training vectors that are mapped into the same Voronoi domain as the model vector. On the contrary, each model vector of the SOM is determined as a weighted average of all of those training vectors that are mapped into the "topological" neighborhood around the corresponding model. The number of training vectors mapped into the neighborhood of a SOM model is generally much larger than that mapped into a Voronoi domain around a model in the VQ. Since the SOM model vectors are then determined with a significantly higher statistical accuracy, the Voronoi domains of the SOM are significantly more regular, and the resulting rms QE may then be smaller than in the VQ. However, the effective dimensionality of the vectors must also be sufficiently high.