A quantitative evaluation of linguistic tests for the automatic prediction of semantic markedness

Authors:
Vasileios Hatzivassiloglou;Kathleen McKeown
Affiliations:
Columbia University, New York, N.Y.;Columbia University, New York, N.Y.
Venue:
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Year:
1995

Citing 7
Cited 3

Building a large-scale knowledge base for machine translation

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Induction of Decision Trees

Machine Learning
Co-occurrences of antonymous adjectives and their contexts

Computational Linguistics
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Towards the automatic identification of adjectival scales: clustering adjectives according to meaning

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Markedness and frequency: a computational analysis

COLING '82 Proceedings of the 9th conference on Computational linguistics - Volume 1
Semantic classes and syntactic ambiguity

HLT '93 Proceedings of the workshop on Human Language Technology

Learning methods to combine linguistic indicators: improving aspectual classification and revealing linguistic insights

Computational Linguistics
Predicting the semantic orientation of adjectives

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Analyzing, Detecting, and Exploiting Sentiment in Web Queries

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a corpus-based study of methods that have been proposed in the linguistics literature for selecting the semantically unmarked term out of a pair of antonymous adjectives. Solutions to this problem are applicable to the more general task of selecting the positive term from the pair. Using automatically collected data, the accuracy and applicability of each method is quantified, and a statistical analysis of the significance of the results is performed. We show that some simple methods are indeed good indicators for the answer to the problem while other proposed methods fail to perform better than would be attributable to chance. In addition, one of the simplest methods, text frequency, dominates all others. We also apply two generic statistical learning methods for combining the indications of the individual methods, and compare their performance to the simple methods. The most sophisticated complex learning method offers a small, but statistically significant, improvement over the original tests.