TakeLab: systems for measuring semantic text similarity

Authors:
Frane Šarić;Goran Glavaš;Mladen Karan;Jan Šnajder;Bojana Dalbelo Bašić
Affiliations:
University of Zagreb;University of Zagreb;University of Zagreb;University of Zagreb;University of Zagreb
Venue:
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Year:
2012

Citing 19
Cited 3

A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
NLTK: the natural language toolkit

COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
A new sentence similarity measure and sentence based extractive technique for automatic text summarization

Expert Systems with Applications: An International Journal
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
The Stanford typed dependencies representation

CrossParser '08 Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
A study on similarity and relatedness using distributional and WordNet-based approaches

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Techniques for improving web retrieval effectiveness

Information Processing and Management: an International Journal
The Meteor metric for automatic evaluation of machine translation

Machine Translation
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
SyMSS: A syntax-based measure for short-text semantic similarity

Data & Knowledge Engineering
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
SemEval-2012 task 6: a pilot on semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Exploring coreference uncertainty of generically extracted event mentions

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A new benchmark dataset with production methodology for short text semantic similarity algorithms

ACM Transactions on Speech and Language Processing (TSLP)
Knowledge-based graph document modeling

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the two systems for determining the semantic similarity of short texts submitted to the SemEval 2012 Task 6. Most of the research on semantic similarity of textual content focuses on large documents. However, a fair amount of information is condensed into short text snippets such as social media posts, image captions, and scientific abstracts. We predict the human ratings of sentence similarity using a support vector regression model with multiple features measuring word-overlap similarity and syntax similarity. Out of 89 systems submitted, our two systems ranked in the top 5, for the three overall evaluation metrics used (overall Pearson -- 2nd and 3rd, normalized Pearson -- 1st and 3rd, weighted mean -- 2nd and 5th).