Limited-memory matrix methods with applications
Limited-memory matrix methods with applications
A vector space model for automatic indexing
Communications of the ACM
Concept decompositions for large sparse text data using clustering
Machine Learning
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Model-based overlapping clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploration of a text collection and identification of topics by clustering
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Hi-index | 0.00 |
This paper presents a preliminary analysis of the neuroscience knowledge domain, and an application of cluster analysis to identify topics in neuroscience. A collection of posters presented at the Society for Neuroscience (SfN) Annual Meeting in 2006 is first explored by viewing existing topics and poster sessions using multidimensional scaling. Based on the Vector Space Model, several Term Spaces were built on the basis of a set of terms extracted from the posters' abstracts and titles, and a set of free keywords assigned to the posters by their authors. The ensuing Term Spaces were compared from the point of view of retrieving the genuine category titles. Topics were extracted from the abstracts of posters by clustering the documents using a bisecting k-means algorithm and selecting the most salient terms for each cluster by ranking. The terms extracted as topic descriptors were evaluated by comparing them to existing titles assigned to thematic categories defined by human experts in neuroscience. A comparison of two approaches for terms ranking (Document Frequency and Log-Entropy) resulted in better performance of the Log-Entropy scores, allowing to retrieve 31.0% of original title terms in clustered documents (and 37.1% in original thematic categories).