Using small random samples for the manual evaluation of statistical association measures

Authors:
Stefan Evert;Brigitte Krenn
Affiliations:
IMS, University of Stuttgart, Azenbergstr. 12, 70174 Stuttgart, Germany;ÖFAI, Freyung 6/6, A-1010 Vienna, Austria
Venue:
Computer Speech and Language
Year:
2005

Citing 4
Cited 15

Word association norms, mutual information, and lexicography

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Experiments on candidate data for collocation extraction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Evolving new lexical association measures using genetic programming

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Prepositions in applications: A survey and introduction to the special issue

Computational Linguistics
Statistically-driven alignment-based multiword expression identification for technical domains

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Multiword expressions in the wild?: the mwetoolkit comes in handy

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations
Web-based and combined language models: a case study on noun compound identification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatic extraction of NV expressions in Basque: basic issues on cooccurrence techniques

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Identification and treatment of multiword expressions applied to information retrieval

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
A hybrid approach for multiword expression identification

PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Automatic identification of persian light verb constructions

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A broad evaluation of techniques for automatic acquisition of multiword expressions

ACL '12 Proceedings of ACL 2012 Student Research Workshop
A generic framework for multiword expressions treatment: from acquisition to applications

ACL '12 Proceedings of ACL 2012 Student Research Workshop
Modeling the internal variability of multiword expressions through a pattern-based method

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Editorial: Minimally-supervised learning of domain-specific causal relations using an open-domain corpus as knowledge base

Data & Knowledge Engineering
Estimation of a Priori Decision Threshold for Collocations Extraction: An Empirical Study

International Journal of Information Technology and Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe the empirical evaluation of statistical association measures for the extraction of lexical collocations from text corpora. We argue that the results of an evaluation experiment cannot easily be generalized to a different setting. Consequently, such experiments have to be carried out under conditions that are as similar as possible to the intended use of the measures. Finally, we show how an evaluation strategy based on random samples can reduce the amount of manual annotation work significantly, making it possible to perform many more evaluation experiments under specific conditions.