Sense cluster based categorization and clustering of abstracts

Authors:
Davide Buscaldi;Paolo Rosso;Mikhail Alexandrov;Alfons Juan Ciscar
Affiliations:
Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain;Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain;Center for Computing Research, National Polytechnic Institute, Mexico;Dpto. Sistemas Informáticos y Computación (DSIC), Universidad Politécnica de Valencia, Spain
Venue:
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2006

Citing 2
Cited 0

Analysis of Clustering Algorithms for Web-Based Search

PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
An approach to clustering abstracts

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on the use of sense clusters for classification and clustering of very short texts such as conference abstracts. Common keyword-based techniques are effective for very short documents only when the data pertain to different domains. In the case of conference abstracts, all the documents are from a narrow domain (i.e., share a similar terminology), that increases the difficulty of the task. Sense clusters are extracted from abstracts, exploiting the WordNet relationships existing between words in the same text. Experiments were carried out both for the categorization task, using Bernoulli mixtures for binary data, and the clustering task, by means of Stein’s MajorClust method.