Statistical filtering and subcategorization frame acquisition

Authors:
Anna Korhonen;Genevieve Gorrell;Diana McCarthy
Affiliations:
University of Cambridge, Cambridge, UK;University of Cambridge, Cambridge, UK;University of Sussex, Brighton, UK
Venue:
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Year:
2000

Citing 12
Cited 13

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
From grammar to lexicon: unsupervised learning of lexical syntax

Computational Linguistics - Special issue on using large corpora: II
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
On learning more appropriate Selectional Restrictions

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Automatic extraction of subcorpora based on subcategorization frames from a part-of-speech tagged corpus

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
The derivation of a grammatically indexed lexicon from the Longman Dictionary of Contemporary English

ACL '87 Proceedings of the 25th annual meeting on Association for Computational Linguistics
Automatic acquisition of subcategorization frames from untagged text

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Automatic acquisition of a large subcategorization dictionary from corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Comlex Syntax: building a computational lexicon

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Automatic extraction of subcategorization frames for Czech

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Acquiring lexical generalizations from corpora: a case study for diathesis alternations

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Learning verb argument structure from minimally annotated corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Can subcategorization help a statistical dependency parser?

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using semantically motivated estimates to help subcategorization acquisition

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Semantically motivated subcategorization acquisition

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
MEANING: a roadmap to knowledge technologies

COLING-Roadmap '02 Proceedings of the 2002 COLING workshop: A roadmap for computational linguistics - Volume 13
Learning Greek verb complements: addressing the class imbalance

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning verb complements for modern greek: Balancing the noisy dataset

Natural Language Engineering
The effect of borderline examples on language learning

Journal of Experimental & Theoretical Artificial Intelligence
A subcategorization acquisition system for French verbs

HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
IRASubcat, a highly customizable, language independent tool for the acquisition of verbal subcategorization information from corpus

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Using unknown word techniques to learn known words

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Acquisition of unknown word paradigms for large-scale grammars

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Learning syntactic verb frames using graphical models

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research into the automatic acquisition of subcategorization frames (SCFs) from corpora is starting to produce large-scale computational lexicons which include valuable frequency information. However, the accuracy of the resulting lexicons shows room for improvement. One significant source of error lies in the statistical filtering used by some researchers to remove noise from automatically acquired subcategorization frames. In this paper, we compare three different approaches to filtering out spurious hypotheses. Two hypothesis tests perform poorly, compared to filtering frames on the basis of relative frequency. We discuss reasons for this and consider directions for future research.