A vector space model for automatic indexing
Communications of the ACM
Contextual correlates of synonymy
Communications of the ACM
Development and use of a gold-standard data set for subjectivity classifications
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Finding similar questions in large question and answer archives
Proceedings of the 14th ACM international conference on Information and knowledge management
Sentence Fusion for Multidocument News Summarization
Computational Linguistics
An information retrieval approach to ontology mapping
Data & Knowledge Engineering - Special issue: Application of natural language to information systems (NLDB04)
Sentence Similarity Based on Semantic Nets and Corpus Statistics
IEEE Transactions on Knowledge and Data Engineering
Health dialog systems for patients and consumers
Journal of Biomedical Informatics - Special issue: Dialog systems for health communications
Semantic similarity applied to spoken dialogue summarization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Natural language querying for video databases
Information Sciences: an International Journal
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery from Data (TKDD)
Expert Systems with Applications: An International Journal
Introduction to Information Retrieval
Introduction to Information Retrieval
Matching large ontologies: A divide-and-conquer approach
Data & Knowledge Engineering
Designing an interactive open-domain question answering system
Natural Language Engineering
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Evaluation of automatically reformulated questions in question series
IRQA '08 Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering
Goal orientated conversational agents: applications to benefit society
KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
A new benchmark dataset with production methodology for short text semantic similarity algorithms
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
Short text semantic similarity measurement is a new and rapidly growing field of research. 'Short texts' are typically sentence length but are not required to be grammatically correct. There is great potential for applying these measures in fields such as information retrieval, dialogue management and question answering. A dataset of 65 sentence pairs, with similarity ratings, produced in 2006 has become adopted as a de facto gold standard benchmark. This paper discusses the adoption of the 2006 dataset, lays down a number of criteria that can be used to determine whether a dataset should be awarded a 'gold standard' accolade and illustrates its use as a benchmark. Procedures for the generation of further gold standard datasets in this field are recommended.