Detecting sentence boundaries in japanese speech transcriptions using a morphological analyzer

  • Authors:
  • Sachie Tajima;Hidetsugu Nanba;Manabu Okumura

  • Affiliations:
  • Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama, Japan;Graduate School of Information Sciences Hiroshima City University, Hiroshima, Japan;Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Japan

  • Venue:
  • IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method to automatically detect sentenceboundaries(SBs) in Japanese speech transcriptions. Our method uses a Japanese morphological analyzer that is based on a cost calculation and selects as the best result the one with the minimum cost. The idea behind using a morphological analyzer to identify candidates for SBs is that the analyzer outputs lower costs for better sequences of morphemes. After the candidate SBs have been identified, the unsuitable candidates are deleted by using lexical information acquired from the training corpus. Our method had a 77.24% precision, 88.00% recall, and 0.8277 F-Measure, for a corpus consisting of lecture speech transcriptions in which the SBs are not given.