A comparative study of two short text semantic similarity measures

Authors:
James O'Shea;Zuhair Bandar;Keeley Crockett;David McLean
Affiliations:
Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom;Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom;Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom;Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
Venue:
KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications
Year:
2008

Citing 5
Cited 10

How may I help you?

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Contextual correlates of synonymy

Communications of the ACM
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Health dialog systems for patients and consumers

Journal of Biomedical Informatics - Special issue: Dialog systems for health communications
AutoTutor: an intelligent tutoring system with mixed-initiative dialogue

IEEE Transactions on Education

Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
German encyclopedia alignment based on information retrieval techniques

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Word sense disambiguation-based sentence similarity

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Comparability of LSI and human judgment in text analysis tasks

MMACTEE'09 Proceedings of the 11th WSEAS international conference on Mathematical methods and computational techniques in electrical engineering
SyMSS: A syntax-based measure for short-text semantic similarity

Data & Knowledge Engineering
Semantic similarity measures for the development of Thai dialog system

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
Text similarity using google tri-grams

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
A simple unsupervised latent semantics based approach for sentence similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Modeling sentences in the latent space

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A new benchmark dataset with production methodology for short text semantic similarity algorithms

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes a comparative study of STASIS and LSA. These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs). CAs are computer programs that interact with humans through natural language dialogue. Business organizations have spent large sums of money in recent years developing them for online customer selfservice, but achievements have been limited to simple FAQ systems. We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures. "Short texts" are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages. We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings. This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers.