Some Experiments on Clustering Similar Sentences of Texts in Portuguese

Authors:
Eloize Rossi Seno;Maria Das Nunes
Affiliations:
NILC-ICMC, University of São Paulo, São Carlos, Brazil 13560-970;NILC-ICMC, University of São Paulo, São Carlos, Brazil 13560-970
Venue:
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Year:
2008

Citing 3
Cited 0

Information Retrieval

Information Retrieval
Sentence Fusion for Multidocument News Summarization

Computational Linguistics
RELFIN – topic discovery for ontology enhancement and annotation

ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying similar text passages plays an important role in many applications in NLP, such as paraphrase generation, automatic summarization, etc. This paper presents some experiments on detecting and clustering similar sentences of texts in Brazilian Portuguese. We propose an evalution framework based on an incremental and unsupervised clustering method which is combined with statistical similarity metrics to measure the semantic distance between sentences. Experiments show that this method is robust even to treat small data sets. It has achieved 86% and 93% of F-measure and Purity, respectively, and 0.037 of Entropy for the best case.