Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Empirical methods for artificial intelligence
Empirical methods for artificial intelligence
A maximum entropy approach to natural language processing
Computational Linguistics
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition
Statistical methods for speech recognition
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars
Computational Linguistics - Special issue on using large corpora: I
PCFG models of linguistic tree representations
Computational Linguistics
Estimators for stochastic "Unification-Based" grammars
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Relating probabilistic grammars and automata
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Lessons from a failure: generating tailored smoking cessation letters
Artificial Intelligence
Deep syntactic processing by combining shallow methods
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Conditional structure versus conditional estimation in NLP models
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
PCFG parsing for restricted classical Chinese texts
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Discriminative training of a neural network statistical parser
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Contrastive estimation: training log-linear models on unlabeled data
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Competitive generative models with structure learning for NLP classification tasks
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sparse multi-scale grammars for discriminative latent variable parsing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient extraction of grammatical relations
Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Higher-order constituent parsing and parser combination
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
An information-theoretic measure to evaluate parsing difficulty across treebanks
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
This paper compares two different ways of estimating statistical language models. Many statistical NLP tagging and parsing models are estimated by maximizing the (joint) likelihood of the fully-observed training data. However, since these applications only require the conditional probability distributions, these distributions can in principle be learnt by maximizing the conditional likelihood of the training data. Perhaps somewhat surprisingly, models estimated by maximizing the joint were superior to models estimated by maximizing the conditional, even though some of the latter models intuitively had access to "more information".