From grammar to lexicon: unsupervised learning of lexical syntax

Authors:
Michael R. Brent
Affiliations:
Johns Hopkins University
Venue:
Computational Linguistics - Special issue on using large corpora: II
Year:
1993

Citing 8
Cited 53

Automatic acquisition of subcategorization frames from unrestricted english

Automatic acquisition of subcategorization frames from unrestricted english
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Automatic acquisition of subcategorization frames from untagged text

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Structural ambiguity and lexical relations

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A simple rule-based part of speech tagger

HLT '91 Proceedings of the workshop on Speech and Natural Language
Automatically acquiring phrase structure using distributional analysis

HLT '91 Proceedings of the workshop on Speech and Natural Language

Machine translation divergences: a formal description and proposed solution

Computational Linguistics
Large-Scale Dictionary Construction for ForeignLanguage Tutoring and Interlingual Machine Translation

Machine Translation
Influence of Conditional Independence Assumption on Verb Subcategorization Detection

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Machine Learning in Human Language Technology

Machine Learning and Its Applications, Advanced Lectures
Introduction to the special issue on computational linguistics using large corpora

Computational Linguistics - Special issue on using large corpora: I
Automatic verb classification based on statistical distributions of argument structure

Computational Linguistics
Generalizing case frames using a thesaurus and the MDL principle

Computational Linguistics
Parsing engineering and empirical robustness

Natural Language Engineering
Large-scale acquisition of LCS-based lexicons for foreign language tutoring

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatic verb classification using distributions of grammatical features

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Word association and MI-Trigger-based language modeling

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Corpus statistics meet the noun compound: some empirical results

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Comlex Syntax: building a computational lexicon

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A corpus-based learning technique for building a self-extensible parser

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Role of word sense disambiguation in lexical acquisition: predicting semantics from syntactic cues

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Automatic extraction of subcategorization frames for Czech

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Automatic lexical acquisition based on statistical distributions

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Using a hybrid system of corpus and knowledge-based techniques to automate the induction of a lexical sublanguage grammar

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Acquiring lexical generalizations from corpora: a case study for diathesis alternations

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A general feature space for automatic verb classification

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
High precision extraction of grammatical relations

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Can subcategorization help a statistical dependency parser?

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
The Comlex Syntax project: the first year

HLT '94 Proceedings of the workshop on Human Language Technology
Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks

Computational Linguistics
Statistical filtering and subcategorization frame acquisition

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Using semantically motivated estimates to help subcategorization acquisition

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Using co-composition for acquiring syntactic and semantic subcategorisation

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Learning argument/adjunct distinction for Basque

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Semantically motivated subcategorization acquisition

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
High precision extraction of grammatical relations

New developments in parsing technology
Large-scale induction and evaluation of lexical resources from the Penn-II treebank

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Towards a semantic classification of Spanish verbs based on subcategorisation information

ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Subcategorization acquisition and evaluation for Chinese verbs

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improving English subcategorization acquisition with diathesis alternations as heuristic information

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Parsing and subcategorization data

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Learning verb complements for modern greek: Balancing the noisy dataset

Natural Language Engineering
A general feature space for automatic verb classification

Natural Language Engineering
A corpus-based analysis of argument realization by preposition structures

Natural Language Engineering
Clustering Hungarian verbs on the basis of complementation patterns

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Parsing and subcategorization data

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
A subcategorization acquisition system for French verbs

HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Morphemes as necessary concept for structures discovery from untagged corpora

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Computer-based support for patients with limited English

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Problems with Pruning in Automatic Creation of Semantic Valence Dictionary for Polish

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Robust extraction of subcategorization data from spoken language

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Selection restrictions acquisition for parsing improvement

INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
IRASubcat, a highly customizable, language independent tool for the acquisition of verbal subcategorization information from corpus

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Automatic detection of non-deverbal event nouns for quick lexicon production

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Analysis of definitions of verbs in an explanatory dictionary for automatic extraction of actants based on detection of patterns

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Acquisition of unknown word paradigms for large-scale grammars

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Extracting idiomatic hungarian verb frames

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Imagine a language that is completely unfamiliar; the only means of studying it are an ordinary grammar book and a very large corpus of text. No dictionary is available. How can easily recognized, surface grammatical facts be used to extract from a corpus as much syntactic information as possible about individual words? This paper describes an approach based on two principles. First, rely on local morpho-syntactic cues to structure rather than trying to parse entire sentences. Second, treat these cues as probabilistic rather than absolute indicators of syntactic structure. Apply inferential statistics to the data collected using the cues, rather than drawing a categorical conclusion from a single occurrence of a cue. The effectiveness of this approach for inferring the syntactic frames of verbs is supported by experiments on an English corpus using a program called Lerner. Lerner starts out with no knowledge of content words---it bootstraps from determiners, auxiliaries, modals, prepositions, pronouns, complementizers, coordinating conjunctions, and punctuation.