Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Achieving an Almost Correct PoS-Tagged Corpus
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
An annotation scheme for free word order languages
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Specifying a shallow grammatical representation for parsing purposes
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Automatic refinement of a POS tagger using a reliable parser and plain text corpora
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
CLAWS4: the tagging of the British National Corpus
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Annotating topological fields and chunks: and revising POS tags at the same time
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Querying Linguistic Treebanks with Monadic Second-Order Logic in Linear Time
Journal of Logic, Language and Information
Definitional, personal, and mechanical constraints on part of speech annotation performance
Natural Language Engineering
Detecting errors in discontinuous structural annotation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Representations for category disambiguation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Correcting a PoS-tagged corpus using three complementary methods
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Analysing Wikipedia and gold-standard corpora for NER training
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Training Data Cleaning for Text Classification
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Detection of strange and wrong automatic part-of-speech tagging
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Correcting errors in a treebank based on synchronous tree substitution grammar
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Exploring the data-driven prediction of prepositions in English
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Part-of-speech tagging from 97% to 100%: is it time for some linguistics?
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Using derivation trees for treebank error detection
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Collaborative data cleaning for sentiment classification with noisy training corpus
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Simultaneous error detection at two levels of syntactic annotation
LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Annotating particle realization and ellipsis in Korean
LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Learning multilingual named entity recognition from Wikipedia
Artificial Intelligence
Improving Text Classification Accuracy by Training Label Cleaning
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We propose a new method for detecting errors in "gold-standard" part-of-speech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finite-state tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Tree-bank.