Deducing linguistic structure from the statistics of large corpora

Authors:
Eric Brill;David Magerman;Mitchell Marcus;Beatrice Santorini
Affiliations:
-;-;-;-
Venue:
HLT '90 Proceedings of the workshop on Speech and Natural Language
Year:
1990

Citing 3
Cited 25

A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Parsing a natural language using mutual information statistics

AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 2

Parsing the voyager domain using pearl

HLT '91 Proceedings of the workshop on Speech and Natural Language
Automatic acquisition of subcategorization frames from tagged text

HLT '91 Proceedings of the workshop on Speech and Natural Language
Tree insertion grammar: a cubic-time, parsable formalism that lexicalizes context-free grammar without changing the trees produced

Computational Linguistics
Improving statistical language model performance with automatically generated word hierarchies

Computational Linguistics
A Review of Statistical Language Processing Techniques

Artificial Intelligence Review
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
Similarity-based word sense disambiguation

Computational Linguistics - Special issue on word sense disambiguation
Restricted representation of phrase structure grammar for building a tree annotated corpus of Korean

Natural Language Engineering
Review of "Statistical language learning" by Eugene Charniak. The MIT Press 1993.

Computational Linguistics
Parsing the Wall Street Journal with the inside-outside algorithm

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
A fast partial parse of natural language sentences using a connectionist method

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Pearl: a probabilistic chart parser

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
A multi-neuro tagger using variable lengths of contexts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Towards history-based grammars: using richer models for probabilistic parsing

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

HLT '91 Proceedings of the workshop on Speech and Natural Language
Towards history-based grammars: using richer models for probabilistic parsing

HLT '91 Proceedings of the workshop on Speech and Natural Language
Automatically acquiring phrase structure using distributional analysis

HLT '91 Proceedings of the workshop on Speech and Natural Language
Using co-occurrence statistics as an information source for partial parsing of Chinese

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Grammar induction by MDL-based distributional classification

New developments in parsing technology
Unsupervised learning of Bulgarian POS tags

MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
A machine learning parser using an unlexicalized distituent model

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Building a hierarchical annotated corpus of urdu: the URDU.KON-TB treebank

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Within the last two years, approaches using both stochastic and symbolic techniques have proved adequate to deduce lexical ambiguity resolution rules with less than 3-4% error rate, when trained on moderate sized (500K word) corpora of English text (e.g. Church, 1988; Hindle, 1989). The success of these techniques suggests that much of the grammatical structure of language may be derived automatically through distributional analysis, an approach attempted and abandoned in the 1950s.