Benchmarking short text semantic similarity

Authors:
James O'Shea;Zuhair Bandar;Keeley Crockett;David McLean
Affiliations:
Department of Computing and Mathematics, Manchester Metropolitan University, John Dalton Building, Chester St., Manchester M1 5GD, UK.;Department of Computing and Mathematics, Manchester Metropolitan University, John Dalton Building, Chester St., Manchester M1 5GD, UK.;Department of Computing and Mathematics, Manchester Metropolitan University, John Dalton Building, Chester St., Manchester M1 5GD, UK.;Department of Computing and Mathematics, Manchester Metropolitan University, John Dalton Building, Chester St., Manchester M1 5GD, UK
Venue:
International Journal of Intelligent Information and Database Systems
Year:
2010

Citing 18
Cited 2

A vector space model for automatic indexing

Communications of the ACM
Contextual correlates of synonymy

Communications of the ACM
Development and use of a gold-standard data set for subjectivity classifications

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Finding similar questions in large question and answer archives

Proceedings of the 14th ACM international conference on Information and knowledge management
Sentence Fusion for Multidocument News Summarization

Computational Linguistics
An information retrieval approach to ontology mapping

Data & Knowledge Engineering - Special issue: Application of natural language to information systems (NLDB04)
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Health dialog systems for patients and consumers

Journal of Biomedical Informatics - Special issue: Dialog systems for health communications
Semantic similarity applied to spoken dialogue summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Natural language querying for video databases

Information Sciences: an International Journal
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network

Expert Systems with Applications: An International Journal
Introduction to Information Retrieval

Introduction to Information Retrieval
Matching large ontologies: A divide-and-conquer approach

Data & Knowledge Engineering
Designing an interactive open-domain question answering system

Natural Language Engineering
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Evaluation of automatically reformulated questions in question series

IRQA '08 Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering

Goal orientated conversational agents: applications to benefit society

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
A new benchmark dataset with production methodology for short text semantic similarity algorithms

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Short text semantic similarity measurement is a new and rapidly growing field of research. 'Short texts' are typically sentence length but are not required to be grammatically correct. There is great potential for applying these measures in fields such as information retrieval, dialogue management and question answering. A dataset of 65 sentence pairs, with similarity ratings, produced in 2006 has become adopted as a de facto gold standard benchmark. This paper discusses the adoption of the 2006 dataset, lays down a number of criteria that can be used to determine whether a dataset should be awarded a 'gold standard' accolade and illustrates its use as a benchmark. Procedures for the generation of further gold standard datasets in this field are recommended.