Detecting errors in part-of-speech annotation

Authors:
Markus Dickinson;W. Detmar Meurers
Affiliations:
The Ohio State University;The Ohio State University
Venue:
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Year:
2003

Citing 8
Cited 17

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Achieving an Almost Correct PoS-Tagged Corpus

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Specifying a shallow grammatical representation for parsing purposes

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Automatic refinement of a POS tagger using a reliable parser and plain text corpora

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
CLAWS4: the tagging of the British National Corpus

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Annotating topological fields and chunks: and revising POS tags at the same time

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Querying Linguistic Treebanks with Monadic Second-Order Logic in Linear Time

Journal of Logic, Language and Information
Definitional, personal, and mechanical constraints on part of speech annotation performance

Natural Language Engineering
Detecting errors in discontinuous structural annotation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Representations for category disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Correcting a PoS-tagged corpus using three complementary methods

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Analysing Wikipedia and gold-standard corpora for NER training

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Training Data Cleaning for Text Classification

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Detection of strange and wrong automatic part-of-speech tagging

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Correcting errors in a treebank based on synchronous tree substitution grammar

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Exploring the data-driven prediction of prepositions in English

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Part-of-speech tagging from 97% to 100%: is it time for some linguistics?

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Using derivation trees for treebank error detection

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Collaborative data cleaning for sentiment classification with noisy training corpus

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Simultaneous error detection at two levels of syntactic annotation

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Annotating particle realization and ellipsis in Korean

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Learning multilingual named entity recognition from Wikipedia

Artificial Intelligence
Improving Text Classification Accuracy by Training Label Cleaning

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new method for detecting errors in "gold-standard" part-of-speech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finite-state tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Tree-bank.