Mining massive document collections by the WEBSOM method

  • Authors:
  • Krista Lagus;Samuel Kaski;Teuvo Kohonen

  • Affiliations:
  • Helsinki University of Technology, Neural Networks Research Centre, P.O. Box 5400, FIN-02015 HUT, Finland;Helsinki University of Technology, Neural Networks Research Centre, P.O. Box 5400, FIN-02015 HUT, Finland;Helsinki University of Technology, Neural Networks Research Centre, P.O. Box 5400, FIN-02015 HUT, Finland

  • Venue:
  • Information Sciences: an International Journal - Special issue: Soft computing data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

A viable alternative to the traditional text-mining methods is the WEBSOM, a software system based on the Self-Organizing Map (SOM) principle. Prior to the searching or browsing operations, this method orders a collection of textual items, say, documents according to their contents, and maps them onto a regular two-dimensional array of map units. Documents that are similar on the basis of their whole contents will be mapped to the same or neighboring map units, and at each unit there exist links to the document database. Thus, while the searching can be started by locating those documents that match best with the search expression, further relevant search results can be found on the basis of the pointers stored at the same or neighboring map units, even if they did not match the search criterion exactly. This work contains an overview to the WEBSOM method and its performance, and as a special application, the WEBSOM map of the texts of Encyclopaedia Britannica is described.