Exploration of a text collection and identification of topics by clustering

  • Authors:
  • Antoine Naud;Shiro Usui

  • Affiliations:
  • RIKEN Brain Science Institute, Wako City, Saitama, Japan and Department of Informatics, N. Copernicus University, Torun, Poland;RIKEN Brain Science Institute, Wako City, Saitama, Japan

  • Venue:
  • IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

An application of cluster analysis to identify topics in a collection of posters abstracts from the Society for Neuroscience (SfN) Annual Meeting in 2006 is presented. The topics were identified by selecting from the abstracts belonging to each cluster the terms with the highest scores using different ranking schemes. The ranking scheme based on logentropy showed better performance in this task than other more classical TFIDF schemes. An evaluation of the extracted topics was performed by comparison with previously defined thematic categories for which titles are available, and after assigning each cluster to one dominant category. The results show that repeated bisecting k-means performs better than standard k-means.