Sentence boundary detection in conversational speech transcripts using noisily labeled examples

  • Authors:
  • Hironori Takeuchi;L. Venkata Subramaniam;Shourya Roy;Diwakar Punjani;Tetsuya Nasukawa

  • Affiliations:
  • IBM Tokyo Research Lab, Shimotsuruma 1623-14, Yamato-shi, Kanagawa, Japan;IBM India Research Lab, Plot 4, Block-C, Institutional Area, Vasant Kunj, New Delhi, India;IBM India Research Lab, Plot 4, Block-C, Institutional Area, Vasant Kunj, New Delhi, India;IBM India Research Lab, Plot 4, Block-C, Institutional Area, Vasant Kunj, New Delhi, India;IBM Tokyo Research Lab, Shimotsuruma 1623-14, Yamato-shi, Kanagawa, Japan

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a technique for adding sentence boundaries to text obtained by Automatic Speech Recognition (ASR) of conversational speech audio. We show that starting with imprecise boundary information, added using only silence information from an ASR system, we can improve boundary detection using Head and Tail phrases. We develop our technique and show its effectiveness on two manually transcribed and one automatically transcribed corpus. The main purpose of adding sentence boundaries to ASR transcripts is to improve linguistic analysis, namely information extraction, for text mining systems that handle huge volumes of textual data and analyze trends and features of the concepts. Hence, we also show how the addition of boundaries improves two basic natural language processing tasks—PoS label assignment and adjective-noun extraction.