Parsing noisy sentences

  • Authors:
  • Hiroaki Saito;Masaru Tomita

  • Affiliations:
  • Center for Machine Translation, Carnegie Mellon University, Pittsburgh, PA;Center for Machine Translation, Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a method to parse and understand a "noisy" sentence that possibly includes errors caused by a speech recognition device. Our parser is connected to a speech recognition device which takes a continuously spoken sentence in Japanese and produces a sequence of phonemes. The output sequence of phonemes can quite possibly include errors: altered phonemes, extra phonemes and missing phonemes. The task is to parse the noisy phoneme sequence and understand the meaning of the original input sentence, given an augmented context-free grammar whose terminal symbols are phonemes. A very efficient parsing method is required, as the task's search space is much larger than that of parsing un-noisy sentences. We adopt the generalized LR parsing algorithm, and a certain scoring scheme to select the most likely sentence out of multiple sentence candidates. The use of a confusion matrix, which is created in advance by analyzing a large set of input/output pairs, is discussed to improve the scoring accuracy. The system has been integrated into CMU's knowledge-based machine translation system.