DeepPurple: estimating sentence semantic similarity using n-gram regression models and web snippets

Authors:
Nikos Malandrakis;Elias Iosif;Alexandros Potamianos
Affiliations:
Technical University of Crete, Chania, Greece;Technical University of Crete, Chania, Greece;Technical University of Crete, Chania, Greece
Venue:
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Year:
2012

Citing 24
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries

IEEE Transactions on Knowledge and Data Engineering
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Exploiting paraphrases in a Question Answering system

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Methods for using textual entailment in open-domain question answering

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Recognising textual entailment with logical inference

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Word Sense Disambiguation: Algorithms and Applications

Word Sense Disambiguation: Algorithms and Applications
Learning entailment rules for unary templates

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
A machine learning approach to textual entailment recognition

Natural Language Engineering
Learning textual entailment using SVMs and string similarity measures

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Paraphrase recognition using machine learning to combine similarity measures

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Source-language entailment modeling for translating unknown terms

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Unsupervised Semantic Similarity Computation between Terms Using Web Documents

IEEE Transactions on Knowledge and Data Engineering
Distributional memory: A general framework for corpus-based semantics

Computational Linguistics
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We estimate the semantic similarity between two sentences using regression models with features: 1) n-gram hit rates (lexical matches) between sentences, 2) lexical semantic similarity between non-matching words, and 3) sentence length. Lexical semantic similarity is computed via co-occurrence counts on a corpus harvested from the web using a modified mutual information metric. State-of-the-art results are obtained for semantic similarity computation at the word level, however, the fusion of this information at the sentence level provides only moderate improvement on Task 6 of SemEval'12. Despite the simple features used, regression models provide good performance, especially for shorter sentences, reaching correlation of 0.62 on the SemEval test set.