A quickly trainable hybrid SOM-based document organization system

  • Authors:
  • Renato Fernandes Corrêa;Teresa Bernarda Ludermir

  • Affiliations:
  • Center of Informatics, Federal University of Pernambuco, P.O. Box 7851, Cidade Universitária, 50.732-970 Recife-PE, Brazil;Center of Informatics, Federal University of Pernambuco, P.O. Box 7851, Cidade Universitária, 50.732-970 Recife-PE, Brazil

  • Venue:
  • Neurocomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

The large volume of nowadays document collections has increased the need of fast trainable document organization systems. This paper presents and evaluates a hybrid system to self-organization of massive document collections based on self-organizing map (SOM). The hybrid system uses prototypes generated by a clustering algorithm to train the document maps, thus reducing the training time of large maps. We test the system with k-means and modified leader clustering algorithms. The experiments are carried out with the Reuters-21758 v1.0 and 20 Newsgroup collections. The performance of the system is measured in terms of text categorization effectiveness on test set and training time. Experimental results show that the proposed system generates effective document maps in less time than SOM. However, the hybrid system using k-means generates better document maps than the one using modified leader at the cost of more long training time.