Industry: text mining with self-organizing maps

  • Authors:
  • Dieter Merkl

  • Affiliations:
  • Associate Professor of Computer Science, Institute of Software Technology, Vienna University of Technology, Austria

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today's information age may be characterized by constant massive production and dissemination of written information. More powerful tools for exploring, searching, and organizing the available mass of information are needed to cope with this situation. This need is our starting point for applying data mining techniques on unstructured information as present in text archives. The users will particularly benefit from cluster techniques that uncover similar documents and bring these similarities to the user's attention. In our approach to text mining we suggest relying on the utilization of self-organizing maps for the analysis of a document archive. The benefit of this approach is the intuitive visualization of document similarities thanks to the spatial ordering of the documents within the self-organizing map. We augment the basic capabilities of the neural network with a data description technique that, based on the features learned by the map, automatically selects the most descriptive features of the input patterns mapped onto a particular unit of the map, thus making the associations between the various clusters within the map explicit. We demonstrate the benefits of this approach by using a real-world document archive comprised of articles from Time magazine.