Discovering key concepts in verbose queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
A statistics-based approach for clustering documents and for extracting cluster topics is described. Relevant (meaningful) Expressions (REs) automatically extracted from corpora are used as clustering base features. These features are transformed and its number is strongly reduced in order to obtain a small set of document classificationfeatures. This is achieved on the basis of PrincipalComponents Analysis. Model-Based Clustering Analysis finds thebest number of clusters. Then, the most important REs are extracted from each cluster and taken as document cluster topics.