Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Information-based syntax and semantics: Vol. 1: fundamentals
Information-based syntax and semantics: Vol. 1: fundamentals
The logic of typed feature structures
The logic of typed feature structures
Statistical Language Learning
From Word Hypotheses to Logical Form: An Efficient Interleaved Approach
Natural Language Processing and Speech Technology, Results of the 3rd KONVENS Conference
Practical experiments with regular approximation of context-free languages
Computational Linguistics - Special issue on finite-state methods in NLP
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars
Computational Linguistics - Special issue on using large corpora: I
Efficient feature structure operations without compilation
Natural Language Engineering
A compact architecture for dialogue management based on scripts and meta-outputs
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Experiments with corpus-based LFG specialization
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Using restriction to extend parsing algorithms for complex-feature-based formalisms
ACL '85 Proceedings of the 23rd annual meeting on Association for Computational Linguistics
Finite-state approximation of phrase structure grammars
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
TDL: a type description language for constraint-based grammars
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Categorial unification grammars
COLING '86 Proceedings of the 11th coference on Computational linguistics
Compiling language models from a linguistically motivated unification grammar
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Modularizing codescriptive grammars for efficient parsing
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A bag of useful techniques for efficient and robust parsing
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Compilation of unification grammars with compositional semantics to speech recognition packages
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An algebra for semantic construction in constraint-based grammars
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Practical issues in compiling typed unification grammars for speech recognition
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Do CFG-based language models need agreement constraints?
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A context-free superset approximation of unification-based grammars
New developments in parsing technology
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Java In A Nutshell, 5th Edition
Java In A Nutshell, 5th Edition
Large-scale corpus-driven PCFG approximation of an HPSG
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Hi-index | 0.00 |
We present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars, such as HPSG, CLE, or PATR-II into context-free grammars (CFGs). Our research is motivated by the idea that we can exploit (large-scale), hand-written unification grammars not only for the purpose of describing natural language and obtaining a syntactic structure (and perhaps a semantic form), but also to address several other very practical topics. Firstly, to speed up deep parsing by having a cheap recognition pre-flter (the approximated CFG). Secondly, to obtain an indirect stochastic parsing model for the unification grammar through a trained PCFG, obtained from the approximated CFG. This gives us an efficient disambiguation model for the unification-based grammar. Thirdly, to generate domain-specific subgrammars for application areas such as information extraction or question answering. And finally, to compile context-free language models which assist the acoustic model of a speech recognizer. The approximation method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in JAVA.