Coarse-to-fine natural language processing

Authors:
Dan Klein;Slav Orlinov Petrov
Affiliations:
University of California, Berkeley;University of California, Berkeley
Venue:
Coarse-to-fine natural language processing
Year:
2009

Citing 0
Cited 14

From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Machine reading at the University of Washington

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Self-training with products of latent variable grammars

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Better Arabic parsing: baselines, evaluations, and analysis

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Cascaded models for articulated pose estimation

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Joint Hebrew segmentation and parsing using a PCFG-LA lattice parser

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Constructing efficient information extraction pipelines

Proceedings of the 20th ACM international conference on Information and knowledge management
Accurate parsing with compact tree-substitution grammars: Double-DOP

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Quasi-synchronous phrase dependency grammars for machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Morphological features for parsing morphologically-rich languages: a case of Arabic

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Vine pruning for efficient multi-pass dependency parsing

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system

Computational Linguistics
Parsing models for identifying multiword expressions

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art natural language processing models are anything but compact. Syntactic parsers have huge grammars, machine translation systems have huge transfer tables, and so on across a range of tasks. With such complexity come two challenges. First, how can we learn highly complex models? Second, how can we efficiently infer optimal structures within them? Hierarchical coarse-to-fine methods address both questions. Coarse-to-fine approaches exploit a sequence of models which introduce complexity gradually. At the top of the sequence is a trivial model in which learning and inference are both cheap. Each subsequent model refines the previous one, until a final, full-complexity model is reached. Because each refinement introduces only limited complexity, both learning and inference can be done in an incremental fashion. In this dissertation, we describe several coarse-to-fine systems. In the domain of syntactic parsing, complexity is in the grammar. We present a latent variable approach which begins with an X-bar grammar and learns to iteratively refine grammar categories. For example, noun phrases might be split into subcategories for subjects and objects, singular and plural, and so on. This splitting process admits an efficient incremental inference scheme which reduces parsing times by orders of magnitude. Furthermore, it produces the best parsing accuracies across an array of languages, in a fully language-general fashion. In the domain of acoustic modeling for speech recognition, complexity is needed to model the rich phonetic properties of natural languages. Starting from a mono-phone model, we learn increasingly refined models that capture phone internal structures, as well as context-dependent variations in an automatic way. Our approaches reduces error rates compared to other baseline approaches, while streamlining the learning procedure. In the domain of machine translation, complexity arises because there and too many target language word types. To manage this complexity, we translate into target language clusterings of increasing vocabulary size. This approach gives dramatic speed-ups while additionally increasing final translation quality.