Predicting strong associations on the basis of corpus data

Authors:
Yves Peirsman;Dirk Geeraerts
Affiliations:
University of Leuven, Leuven, Belgium;University of Leuven, Leuven, Belgium
Venue:
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2009

Citing 9
Cited 2

Using information scent to model user information needs and actions and the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Ensemble methods for automatic thesaurus extraction

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Identifying semantic relations and functional properties of human verb associations

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
Automatic essay grading with probabilistic latent semantic analysis

EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP

The automatic identification of lexical variation between language varieties

Natural Language Engineering
Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current approaches to the prediction of associations rely on just one type of information, generally taking the form of either word space models or collocation measures. At the moment, it is an open question how these approaches compare to one another. In this paper, we will investigate the performance of these two types of models and that of a new approach based on compounding. The best single predictor is the log-likelihood ratio, followed closely by the document-based word space model. We will show, however, that an ensemble method that combines these two best approaches with the compounding algorithm achieves an increase in performance of almost 30% over the current state of the art.