A New Document Clustering Algorithm for Topic Discovering and Labeling

Authors:
Henry Anaya-Sánchez;Aurora Pons-Porrata;Rafael Berlanga-Llavori
Affiliations:
Center for Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Center for Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Department of Languages and Computer Systems, Universitat Jaume I, Castelló, Spain
Venue:
CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
Year:
2008

Citing 4
Cited 0

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Topic discovery based on text mining techniques

Information Processing and Management: an International Journal
Text document clustering based on frequent word meaning sequences

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a new clustering algorithm for obtaining labeled document clusters that accurately identify the topics of a text collection. In order to determine the topics, our approach relies on both probable term pairs generated from the collection and the estimation of the topic homogeneity associated to term pair clusters. Experimental results obtained over two benchmark text collections demonstrate the utility of this new approach.