Example-based correction of word segmentation and part of speech labelling

  • Authors:
  • Tomoyoshi Matsukawa;Scott Miller;Ralph Weischedel

  • Affiliations:
  • BBN Systems and Technologies, Cambridge, MA;BBN Systems and Technologies, Cambridge, MA;BBN Systems and Technologies, Cambridge, MA

  • Venue:
  • HLT '93 Proceedings of the workshop on Human Language Technology
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an example-based correction component for Japanese word segmentation and part of speech labelling (AMED), and a way of combining it with a pre-existing rule-based Japanese morphological analyzer and a probabilistic part of speech tagger.Statistical algorithms rely on frequency of phenomena or events in corpora; however, low frequency events are often inadequately represented. Here we report on an example based technique used in finding word segments and their part of speech in Japanese text. Rather than using hand-crafted rules, the algorithm employs example data, drawing generalizations during training.