Test collection recycling for semantic text similarity

Authors:
Faisal Rahutomo;Teruaki Kitasuka;Masayoshi Aritsugi
Affiliations:
State Polytechnics of Malang, Malang, Indonesia and Kumamoto University, Kumamoto, Japan;Kumamoto University, Kumamoto, Japan;Kumamoto University, Kumamoto, Japan
Venue:
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Year:
2012

Citing 12
Cited 0

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Random walks for text semantic similarity

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
PEM: a paraphrase evaluation metric exploiting parallel texts

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Automatic analysis of semantic similarity in comparable text through syntactic tree matching

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Collecting highly parallel data for paraphrase evaluation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning to grade short answer questions using semantic similarity measures and dependency graph alignments

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning discriminative projections for text similarity measures

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
SemEval-2012 task 6: a pilot on semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic text similarity (STS) uses specific test collections as its performance evaluation measurement. The test collections consist of text pairs with the same meaning even though in different text form. The existence is scarce compared with information retrieval (IR) test collections. This paper investigates the possibility to reuse IR test collections for STS tasks. Text pairs are derived from the relevant pair of IR test collections. Latent semantic analysis (LSA) and explicit semantic analysis (ESA) evaluate Glasgow's test collections, which are provided by ACM SIGIR community. Jaccard index measures the lexical similarity. Recall metric measures retrievability of recycling test collection with two existing test collections, Microsoft research paraphrase corpus and Microsoft research video description corpus, as evaluation baselines. Evaluation yields a promising outcome; the evaluated test collections have low Jaccard index and their recall values between the two baselines.