Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora

  • Authors:
  • Harald Berthelsen;Beáta Megyesi

  • Affiliations:
  • -;-

  • Venue:
  • TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we apply the ensemble approach to the identification of incorrectly annotated items (noise) in a training set. In a controlled experiment, memory-based, decision tree-based and transformation-based classifiers are used as a filter to detect and remove noise deliberately introduced into a manually tagged corpus. The results indicate that the method can be successfully applied to automatically detect errors in a corpus.