From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Machine reading at the University of Washington
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Self-training with products of latent variable grammars
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Better Arabic parsing: baselines, evaluations, and analysis
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Cascaded models for articulated pose estimation
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Joint Hebrew segmentation and parsing using a PCFG-LA lattice parser
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Constructing efficient information extraction pipelines
Proceedings of the 20th ACM international conference on Information and knowledge management
Accurate parsing with compact tree-substitution grammars: Double-DOP
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Quasi-synchronous phrase dependency grammars for machine translation
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Morphological features for parsing morphologically-rich languages: a case of Arabic
SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Vine pruning for efficient multi-pass dependency parsing
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system
Computational Linguistics
Parsing models for identifying multiword expressions
Computational Linguistics
Hi-index | 0.00 |
State-of-the-art natural language processing models are anything but compact. Syntactic parsers have huge grammars, machine translation systems have huge transfer tables, and so on across a range of tasks. With such complexity come two challenges. First, how can we learn highly complex models? Second, how can we efficiently infer optimal structures within them? Hierarchical coarse-to-fine methods address both questions. Coarse-to-fine approaches exploit a sequence of models which introduce complexity gradually. At the top of the sequence is a trivial model in which learning and inference are both cheap. Each subsequent model refines the previous one, until a final, full-complexity model is reached. Because each refinement introduces only limited complexity, both learning and inference can be done in an incremental fashion. In this dissertation, we describe several coarse-to-fine systems. In the domain of syntactic parsing, complexity is in the grammar. We present a latent variable approach which begins with an X-bar grammar and learns to iteratively refine grammar categories. For example, noun phrases might be split into subcategories for subjects and objects, singular and plural, and so on. This splitting process admits an efficient incremental inference scheme which reduces parsing times by orders of magnitude. Furthermore, it produces the best parsing accuracies across an array of languages, in a fully language-general fashion. In the domain of acoustic modeling for speech recognition, complexity is needed to model the rich phonetic properties of natural languages. Starting from a mono-phone model, we learn increasingly refined models that capture phone internal structures, as well as context-dependent variations in an automatic way. Our approaches reduces error rates compared to other baseline approaches, while streamlining the learning procedure. In the domain of machine translation, complexity arises because there and too many target language word types. To manage this complexity, we translate into target language clusterings of increasing vocabulary size. This approach gives dramatic speed-ups while additionally increasing final translation quality.