Example-based correction of word segmentation and part of speech labelling

Authors:
Tomoyoshi Matsukawa;Scott Miller;Ralph Weischedel
Affiliations:
BBN Systems and Technologies, Cambridge, MA;BBN Systems and Technologies, Cambridge, MA;BBN Systems and Technologies, Cambridge, MA
Venue:
HLT '93 Proceedings of the workshop on Human Language Technology
Year:
1993

Citing 5
Cited 8

Grammatical category disambiguation by statistical optimization

Computational Linguistics
Studies in part of speech labelling

HLT '91 Proceedings of the workshop on Speech and Natural Language
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Parsing the LOB corpus

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
A simple rule-based part of speech tagger

HLT '91 Proceedings of the workshop on Speech and Natural Language

Mostly-unsupervised statistical segmentation of Japanese Kanji sequences

Natural Language Engineering
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
BBN: description of the PLUM system as used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
BEN: description of the PLUM system as used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
Hypothesizing word association from untagged text

HLT '93 Proceedings of the workshop on Human Language Technology
Japanese word segmentation by hidden Markov model

HLT '94 Proceedings of the workshop on Human Language Technology
Progress in information extraction

TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
BBN's PLUM Probabilistic Language Understanding system

TIPSTER '93 Proceedings of a workshop on held at Fredericksburg, Virginia: September 19-23, 1993

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an example-based correction component for Japanese word segmentation and part of speech labelling (AMED), and a way of combining it with a pre-existing rule-based Japanese morphological analyzer and a probabilistic part of speech tagger.Statistical algorithms rely on frequency of phenomena or events in corpora; however, low frequency events are often inadequately represented. Here we report on an example based technique used in finding word segments and their part of speech in Japanese text. Rather than using hand-crafted rules, the algorithm employs example data, drawing generalizations during training.