Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

  • Authors:
  • Eric Brill

  • Affiliations:
  • The Johns Hopkins University

  • Venue:
  • Computational Linguistics
  • Year:
  • 1995

Quantified Score

Hi-index 0.01

Visualization

Abstract

Recently, there has been a rebirth of empiricism in the field of natural language processing. Manual encoding of linguistic information is being challenged by automated corpus-based learning as a method of providing a natural language processing system with linguistic knowledge. Although corpus-based approaches have been successful in many different areas of natural language processing, it is often the case that these methods capture the linguistic information they are modelling indirectly in large opaque tables of statistics. This can make it difficult to analyze, understand and improve the ability of these approaches to model underlying linguistic behavior. In this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learning method applied to part-of-speech tagging.