Equations for part-of-speech tagging

Authors:
Eugene Charniak;Curtis Hendrickson;Neil Jacobson;Mike Perkowitz
Affiliations:
Department of Computer Science, Brown University, Providence, RI;Department of Computer Science, Brown University, Providence, RI;Department of Computer Science, Brown University, Providence, RI;Department of Computer Science, Brown University, Providence, RI
Venue:
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Year:
1993

Citing 4
Cited 9

Grammatical category disambiguation by statistical optimization

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Parsing the LOB corpus

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics

Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

IEEE Intelligent Systems
Unsupervised Part-of-Speech Tagging in the Large

Research on Language and Computation
Efficient enumeration of instantiations in Bayesian networks

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Ending-based strategies for part-of-speech tagging

UAI'94 Proceedings of the Tenth international conference on Uncertainty in artificial intelligence
Word folding: taking the snapshot of words instead of the whole

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sequence models for automatic highlighting and surface information extraction

IRSG'99 Proceedings of the 21st Annual BCS-IRSG conference on Information Retrieval Research
Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction

ACM Transactions on Asian Language Information Processing (TALIP)
TV program detection in tweets

Proceedings of the 11th european conference on Interactive TV and video

Quantified Score

Hi-index	0.00

Visualization

Abstract

We derive from first principles the basic equations for a few of the basic hidden-Markov-model word taggers as well as equations for other models which may be novel (the descriptions in previous papers being too spare to be sure). We give performance results for all of the models. The results from our best model (96.45% on an unused test sample from the Brown corpus with 181 distinct tags) is on the upper edge of reported results. We also hope these results clear up some confusion in the literature about the best equations to use. However, the major purpose of this paper is to show how the equations for a variety of models may be derived and thus encourage future authors to give the equations for their model and the derivations thereof.