Document Clustering and Cluster Topic Extraction in Multilingual Corpora

Authors:
Joaquim Ferreira da Silva;João Mexia;Carlos Agra Coelho;José Gabriel Pereira Lopes
Affiliations:
-;-;-;-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 1

Discovering key concepts in verbose queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

A statistics-based approach for clustering documents and for extracting cluster topics is described. Relevant (meaningful) Expressions (REs) automatically extracted from corpora are used as clustering base features. These features are transformed and its number is strongly reduced in order to obtain a small set of document classificationfeatures. This is achieved on the basis of PrincipalComponents Analysis. Model-Based Clustering Analysis finds thebest number of clusters. Then, the most important REs are extracted from each cluster and taken as document cluster topics.