Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Specifying a shallow grammatical representation for parsing purposes
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Comparing a linguistic and a stochastic tagger
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Detecting errors in part-of-speech annotation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Ranking algorithms for named-entity extraction: boosting and the voted perceptron
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bidirectional inference with the easiest-first strategy for tagging sequence data
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Semi-supervised training for the averaged perceptron POS tagger
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Simple semi-supervised training of part-of-speech taggers
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Efficient graph-based semi-supervised learning of structured tagging models
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A fast, accurate, non-projective, semantically-enriched parser
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Resolving syntactic ambiguities in natural language specification of constraints
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Using wiktionary to improve lexical disambiguation in multiple languages
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Generating OLAP queries from natural language specification
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Interacting with data warehouse by using a natural language interface
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Syntactic annotations for the Google Books Ngram Corpus
ACL '12 Proceedings of the ACL 2012 System Demonstrations
A cost sensitive part-of-speech tagging: differentiating serious errors from minor errors
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Type-supervised hidden Markov models for part-of-speech tagging with incomplete tag dictionaries
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Contextual and active learning-based affect-sensing from virtual drama improvisation
ACM Transactions on Speech and Language Processing (TSLP)
Natural language watermarking for german texts
Proceedings of the first ACM workshop on Information hiding and multimedia security
GPText: Greenplum parallel statistical text analysis framework
Proceedings of the Second Workshop on Data Analytics in the Cloud
A graph-based approach to commonsense concept extraction and semantic similarity detection
Proceedings of the 22nd international conference on World Wide Web companion
Commonsense-based topic modeling
Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining
Hi-index | 0.00 |
I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning or better features in a discriminative sequence classifier. The prospects for further gains from semisupervised learning also seem quite limited. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. That is, from improved descriptive linguistics. However, I conclude by suggesting that there are also limits to this process. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis.