2008 Special Issue: Exploration of a collection of documents in neuroscience and extraction of topics by clustering

Authors:
Antoine Naud;Shiro Usui
Affiliations:
Department of Informatics, Nicolaus Copernicus University, ul. Grudziadzka 5, 87-100 Torun, Poland;Laboratory for Neuroinformatics, RIKEN Brain Science Institute, Hirosawa 2-1, Wako, 351-0198 Saitama, Japan
Venue:
Neural Networks
Year:
2008

Citing 10
Cited 0

Limited-memory matrix methods with applications

Limited-memory matrix methods with applications
A vector space model for automatic indexing

Communications of the ACM
Concept decompositions for large sparse text data using clustering

Machine Learning
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Model-based overlapping clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploration of a text collection and identification of topics by clustering

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a preliminary analysis of the neuroscience knowledge domain, and an application of cluster analysis to identify topics in neuroscience. A collection of posters presented at the Society for Neuroscience (SfN) Annual Meeting in 2006 is first explored by viewing existing topics and poster sessions using multidimensional scaling. Based on the Vector Space Model, several Term Spaces were built on the basis of a set of terms extracted from the posters' abstracts and titles, and a set of free keywords assigned to the posters by their authors. The ensuing Term Spaces were compared from the point of view of retrieving the genuine category titles. Topics were extracted from the abstracts of posters by clustering the documents using a bisecting k-means algorithm and selecting the most salient terms for each cluster by ranking. The terms extracted as topic descriptors were evaluated by comparing them to existing titles assigned to thematic categories defined by human experts in neuroscience. A comparison of two approaches for terms ranking (Document Frequency and Log-Entropy) resulted in better performance of the Log-Entropy scores, allowing to retrieve 31.0% of original title terms in clustered documents (and 37.1% in original thematic categories).