The ATIS spoken language systems pilot corpus
HLT '90 Proceedings of the workshop on Speech and Natural Language
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Data-Oriented Parsing
Using an annotated corpus as a stochastic grammar
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Probabilistic tree-adjoining grammar as a framework for statistical natural language processing
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Stochastic lexicalized tree-adjoining grammars
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
An efficient implementation of a new DOP model
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Efficient parsing of highly ambiguous context-free grammars with bit vectors
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Theoretical evaluation of estimation methods for data-oriented parsing
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
An all-subtrees approach to unsupervised parsing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
We explore a novel computational approach to identifying "constructions" or "multi-word expressions" (MWEs) in an annotated corpus. In this approach, MWEs have no special status, but emerge in a general procedure for finding the best statistical grammar to describe the training corpus. The statistical grammar formalism used is that of stochastic tree substitution grammars (STSGs), such as used in Data-Oriented Parsing. We present an algorithm for calculating the expected frequencies of arbitrary subtrees given the parameters of an STSG, and a method for estimating the parameters of an STSG given observed frequencies in a tree bank. We report quantitative results on the ATIS corpus of phrase-structure annotated sentences, and give examples of the MWEs extracted from this corpus.