Integrating contextual information to enhance SOM-based text document clustering

  • Authors:
  • Daniel Pullwitt

  • Affiliations:
  • Department of Computer Sciences, University of Leipzig, Graduiertenkolleg Wissensrepräsentation, 04109 Leipzig, Germany

  • Venue:
  • Neural Networks - New developments in self-organizing maps
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exploration of text corpora using self-organizing maps has shown promising results in recent years. Topographic map approaches usually use the original vector space model known from Information Retrieval for text document representation. In this paper I present a two stage model using features based on sentence categories as alternative approach which includes contextual information. Algorithmic optimizations required by this computationally expensive model are shown and evaluated. Also a method for model independent comparison of document maps by evaluation of document distribution on maps is introduced and used to compare results obtained with both the new model and the vector space model.