IEEE Transactions on Software Engineering - Special issue on computer security and privacy
Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Adaptive mixtures of probabilistic transducers
Neural Computation
Machine Learning
Anomaly Detection over Noisy Data using Learned Probability Distributions
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models
Computational Linguistics - Special issue on using large corpora: II
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
On-Line Error Detection of Annotated Corpus Using Modular Neural Networks
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
ACM Transactions on Asian Language Information Processing (TALIP)
Detecting errors in corpora using support vector machines
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An active approach to spoken language processing
ACM Transactions on Speech and Language Processing (TSLP)
Correcting category errors in text classification
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Morphological annotation of a large spontaneous speech corpus in Japanese
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Evaluating classifiers by means of test data with noisy labels
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Correcting errors in a treebank based on synchronous tree substitution grammar
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Measuring the interestingness of articles in a limited user environment
Information Processing and Management: an International Journal
Collaborative data cleaning for sentiment classification with noisy training corpus
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Reducing the need for double annotation
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Improving Text Classification Accuracy by Training Label Cleaning
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We present a method for automatically detecting errors in a manually marked corpus using anomaly detection. Anomaly detection is a method for determining which elements of a large data set do not conform to the whole. This method fits a probability distribution over the data and applies a statistical test to detect anomalous elements. In the corpus error detection problem, anomalous elements are typically marking errors. We present the results of applying this method to the tagged portion of the Penn Treebank corpus.