Discriminative syntactic language modeling for speech recognition
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extremely lexicalized models for accurate and fast HPSG parsing
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A statistical constraint dependency grammar (CDG) parser
IncrementParsing '04 Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together
A log-linear model with an n-gram reference distribution for accurate HPSG parsing
IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Hi-index | 0.00 |
This thesis focuses on the development of effective and efficient language models (LMs) for speech recognition systems. We selected Constraint Dependency Grammar (CDG) as the underlying framework because CDG parses can be lexicalized at the word level with a rich set of lexical features for modeling subcategorization and wh-movement without a combinatorial explosion of the parameter space and because CDG is able to model languages with crossing dependencies and free word ordering. Two types of LMs were developed: an almost-parsing LM and a full parser-based LM The quality of these LMs gained significantly from the insights obtained from initial CDG grammar induction experiments. The almost-parsing LM uses a data structure derived from CDG parses called a SuperARV that tightly integrates knowledge of words, lexical features, and syntactic constraints. The full CDG parser-based LM utilizes complete parse information obtained by adding the modifiee links to the SuperARVs assigned to each word in a sentence in order to capture important long-distance dependency constraints. We have evaluated the almost-parsing LM on a variety of large vocabulary continuous speech recognition (LVCSR) tasks and found that it reduced recognition error rates significantly compared to commonly used word-based LMs, achieving performance competitive to state-of-the-art parser-based LMs with a significantly lower time complexity. The full CDG parser-based LM, when evaluated on the DARPA Wall Street Journal CSR task, outperformed the almost-parsing LM and produced a performance comparable to or exceeding the state-of-the-art parser-based LMs.