Exploration of a text collection and identification of topics by clustering

Authors:
Antoine Naud;Shiro Usui
Affiliations:
RIKEN Brain Science Institute, Wako City, Saitama, Japan and Department of Informatics, N. Copernicus University, Torun, Poland;RIKEN Brain Science Institute, Wako City, Saitama, Japan
Venue:
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Year:
2007

Citing 3
Cited 1

Concept decompositions for large sparse text data using clustering

Machine Learning
Visiome: neuroinformatics research in vision project

Neural Networks - Special issue: Neuroinformatics
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)

2008 Special Issue: Exploration of a collection of documents in neuroscience and extraction of topics by clustering

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

An application of cluster analysis to identify topics in a collection of posters abstracts from the Society for Neuroscience (SfN) Annual Meeting in 2006 is presented. The topics were identified by selecting from the abstracts belonging to each cluster the terms with the highest scores using different ranking schemes. The ranking scheme based on logentropy showed better performance in this task than other more classical TFIDF schemes. An evaluation of the extracted topics was performed by comparison with previously defined thematic categories for which titles are available, and after assigning each cluster to one dominant category. The results show that repeated bisecting k-means performs better than standard k-means.