The Evaluation of Sentence Similarity Measures

Authors:
Palakorn Achananuparp;Xiaohua Hu;Xiajiong Shen
Affiliations:
College of Information Science and Technology, Drexel University, Philadelphia, PA 19104;College of Information Science and Technology, Drexel University, Philadelphia, PA 19104 and College of Computer and Information Engineering, Hehan University, Henan, China;College of Computer and Information Engineering, Hehan University, Henan, China
Venue:
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Year:
2008

Citing 18
Cited 15

An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Methods for identifying versioned and plagiarized documents

Journal of the American Society for Information Science and Technology
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Why inverse document frequency?

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Similarity measures for tracking information flow

Proceedings of the 14th ACM international conference on Information and knowledge management
Interrogative reformulation patterns and acquisition of question paraphrases

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Aspects of sentence retrieval

Aspects of sentence retrieval
A comparison of sentence retrieval techniques

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Knowledge derived from wikipedia for computing semantic relatedness

Journal of Artificial Intelligence Research
Automatically selecting answer templates to respond to customer emails

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An enhanced framework of subjective logic for semantic document analysis

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Enhancement of subjective logic for semantic document analysis using hierarchical document signature

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Word sense disambiguation-based sentence similarity

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
SyMSS: A syntax-based measure for short-text semantic similarity

Data & Knowledge Engineering
Compositional expectation: a purely distributional model of compositional semantics

IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Using properties to compare both words and clauses

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
Semi-supervised semantic role labeling via structural alignment

Computational Linguistics
IR system evaluation using nugget-based test collections

Proceedings of the fifth ACM international conference on Web search and data mining
A clustering-based approach for discovering flaws in requirements specifications

Proceedings of the 27th Annual ACM Symposium on Applied Computing
An algorithm for fuzzy-based sentence-level document clustering for micro-level contradiction analysis

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
A preference learning approach to sentence ordering for multi-document summarization

Information Sciences: an International Journal
Semantic textual similarity using maximal weighted bipartite graph matching

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
DERI&UPM: pushing corpus based relatedness to similarity: shared task system description

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Using clustering to improve the structure of natural language requirements documents

REFSQ'13 Proceedings of the 19th international conference on Requirements Engineering: Foundation for Software Quality

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to accurately judge the similarity between natural language sentences is critical to the performance of several applications such as text mining, question answering, and text summarization. Given two sentences, an effective similarity measure should be able to determine whether the sentences are semantically equivalent or not, taking into account the variability of natural language expression. That is, the correct similarity judgment should be made even if the sentences do not share similar surface form. In this work, we evaluate fourteen existing text similarity measures which have been used to calculate similarity score between sentences in many text applications. The evaluation is conducted on three different data sets, TREC9 question variants, Microsoft Research paraphrase corpus, and the third recognizing textual entailment data set.