A new benchmark dataset with production methodology for short text semantic similarity algorithms

Authors:
James O'shea;Zuhair Bandar;Keeley Crockett
Affiliations:
Manchester Metropolitan University, Manchester, UK;Manchester Metropolitan University, Manchester, UK;Manchester Metropolitan University, Manchester, UK
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2014

Citing 44
Cited 0

A vector space model for automatic indexing

Communications of the ACM
Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Software Metrics: A Rigorous and Practical Approach

Software Metrics: A Rigorous and Practical Approach
Designing and Evaluating an Adaptive Spoken Dialogue System

User Modeling and User-Adapted Interaction
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
PARADISE: a framework for evaluating spoken dialogue agents

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Design and evaluation of Elva: an embodied tour guide in an interactive virtual art gallery: Research Articles

Computer Animation and Virtual Worlds - Cyberworlds SI
Sentence Fusion for Multidocument News Summarization

Computational Linguistics
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Domain-Specific Knowledge Systems in the Brain: The Animate-Inanimate Distinction

Journal of Cognitive Neuroscience
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Semantic similarity applied to spoken dialogue summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A methodology for extrinsically evaluating information extraction performance

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Identification of spoken questions using similarity-based TF·AoI

Systems and Computers in Japan
SemreX: Efficient search in a semantic overlay for literature retrieval

Future Generation Computer Systems
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network

Expert Systems with Applications: An International Journal
GA, MR, FFNN, PNN and GMM based models for automatic text summarization

Computer Speech and Language
Utilizing Semantic, Syntactic, and Question Category Information for Automated Digital Reference Services

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Multimodal Sentence Similarity in Human-Computer Interaction Systems

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
Designing an interactive open-domain question answering system

Natural Language Engineering
Improving text categorization bootstrapping via unsupervised learning

ACM Transactions on Speech and Language Processing (TSLP)
Text-to-text semantic similarity for automatic short answer grading

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A comparative study of two short text semantic similarity measures

KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications
Benchmarking short text semantic similarity

International Journal of Intelligent Information and Database Systems
Exemplar-based models for word meaning in context

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
A machine learning approach to speech act classification using function words

KES-AMSTA'10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part II
Massive Social Network Analysis: Mining Twitter for Social Good

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Word sense disambiguation-based sentence similarity

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Semantic similarity measures for the development of Thai dialog system

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization

Journal of the American Society for Information Science and Technology
Spatial relations for semantic similarity measurement

ER'05 Proceedings of the 24th international conference on Perspectives in Conceptual Modeling
SemEval-2012 task 6: a pilot on semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
UKP: computing semantic textual similarity by combining multiple content similarity measures

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
TakeLab: systems for measuring semantic text similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Soft cardinality: a parameterized similarity function for text comparison

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Modeling sentences in the latent space

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research presents a new benchmark dataset for evaluating Short Text Semantic Similarity (STSS) measurement algorithms and the methodology used for its creation. The power of the dataset is evaluated by using it to compare two established algorithms, STASIS and Latent Semantic Analysis. This dataset focuses on measures for use in Conversational Agents; other potential applications include email processing and data mining of social networks. Such applications involve integrating the STSS algorithm in a complex system, but STSS algorithms must be evaluated in their own right and compared with others for their effectiveness before systems integration. Semantic similarity is an artifact of human perception; therefore its evaluation is inherently empirical and requires benchmark datasets derived from human similarity ratings. The new dataset of 64 sentence pairs, STSS-131, has been designed to meet these requirements drawing on a range of resources from traditional grammar to cognitive neuroscience. The human ratings are obtained from a set of trials using new and improved experimental methods, with validated measures and statistics. The results illustrate the increased challenge and the potential longevity of the STSS-131 dataset as the Gold Standard for future STSS algorithm evaluation.