ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Direct parsing of discontinuous constituents in German
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
PLCFRS parsing of English discontinuous constituents
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Efficient parsing with linear context-free rewriting systems
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Data-driven parsing using probabilistic linear context-free rewriting systems
Computational Linguistics
Hi-index | 0.00 |
This thesis takes up the problem of syntactic comprehension, or parsing—how an agent (human or machine) with knowledge of a specific language goes about inferring the hierarchical structural relationships underlying a surface string in the language. I take the position that probabilistic models of combining evidential information are cognitively plausible and practically useful for syntactic comprehension. In particular, the thesis applies probabilistic methods in investigating the relationship between word order and psycholinguistic models of comprehension; and in the practical problems of accuracy and efficiency in parsing sentences with syntactic discontinuity. On the psychological side, the thesis proposes a theory of expectation-based processing difficulty as a consequence of probabilistic syntactic disambiguation: the ease of processing a word during comprehension is determined primarily by the degree to which that word is expected. I identify a class of syntactic phenomena, associated primarily with verb-final clause order, where the predictions of expectation-based processing diverge most sharply from more established locality-based theories of processing difficulty. Using existing probabilistic parsing algorithms and syntactically annotated data sources, I show that the expectation-based theory matches a range of established experimental psycholinguistic results better than locality-based theories. The comparison of probabilistic- and locality-driven processing theories is a crucial area of psycholinguistic research due to its implications for the relationship between linguistic production and comprehension, and more generally for theories of modularity in cognitive science. The thesis also takes up the problem of probabilistic models for discontinuous constituency, when phrases do not consist of continuous substrings of a sentence. Discontinuity poses a computational challenge in parsing, because it expands the set of possible substructures in a sentence beyond the bound, quadratic in sentence length, on the set of possible continuous constituents. For discontinuous constituency, I investigate the problem of accuracy employing discriminative classifiers organized on principles of syntactic theory and used to introduce discontinuous relationships into otherwise strictly context-free phrase structure trees; and the problem of efficiency in joint inference over both continuous and discontinuous structures, using probabilistic instantiations of mildly context-sensitive grammatical formalisms and factorizing grammatical generalizations into probabilistic components of dominance and linear order.