Self-organising maps in document classification: a comparison with six machine learning methods

  • Authors:
  • Jyri Saarikoski;Jorma Laurikkala;Kalervo Järvelin;Martti Juhola

  • Affiliations:
  • Department of Computer Sciences, University of Tampere, Finland;Department of Computer Sciences, University of Tampere, Finland;Department of Information Studies and Interactive Media, University of Tampere, Finland;Department of Computer Sciences, University of Tampere, Finland

  • Venue:
  • ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to those of six well known baseline methods: k-means clustering, Ward's clustering, k nearest neighbour searching, discriminant analysis, Naïve Bayes classifier and classification tree. The self-organising map proved to be yielding the highest accuracies of tested unsupervised methods in classification of the Reuters news collection and the Spanish CLEF 2003 news collection, and comparable accuracies against some of the supervised methods in all three data sets.