Methods for exploratory cluster analysis

  • Authors:
  • Samuel Kaski;Janne Nikkilä;Teuvo Kohonen

  • Affiliations:
  • Helsinki University of Technology, Neural Networks Research Centre, P.O. Box 5400, FIN-02015 HUT, Finland;Helsinki University of Technology, Neural Networks Research Centre, P.O. Box 5400, FIN-02015 HUT, Finland;Helsinki University of Technology, Neural Networks Research Centre, P.O. Box 5400, FIN-02015 HUT, Finland

  • Venue:
  • Intelligent exploration of the web
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Self-Organizing Map is a nonlinear projection of a high-dimensional data space. It can be used as an ordered groundwork, a two-dimensional graphical display, for visualizing structures in the data. Different locations on the display correspond to different domains of the data space in an orderly fashion. The models used in the mapping are fitted to the data so that they approximate the data distribution nonlinearly but smoothly. In this paper we introduce new methods for visualizing the cluster structures of the data on the groundwork, and for the interpretation of the structures in terms of the local metric properties of the map. In particular it is possible to find out which variables have the largest discriminatory power between neighboring clusters. The methods are especially suitable in the exploratory phase of data analysis, or preliminary data mining, in which hypotheses on the targets of the analysis are formulated. We have used the methods for analyzing a collection of patent abstract texts. We found, for instance, a cluster of neural networks patents not distinguished by the official patent classification system.