Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Hi-index | 0.00 |
The availability of a large-scale spontaneous speech corpora is crucially important for various domains of spoken language processing. However, the available corpora are usually limited because of its cost to prepare. On the other hand, inexact transcribed corpora have been widely produced in the form of shorthand notes, meeting records, or closed captions. Although these inexact transcribed corpora are more freely available than faithful/exact ones, these are not faithfully transcribed but contains edited transcriptions. Under this background, we are considering to build an efficient semi-automatic framework for converting inexact transcripts to faithful ones or exact transcriptions. This framework consists of two steps: the first step is to automatically detect positions of edited parts, and the second step is to manually transcribe the edited parts. This paper proposes an automatic detection method of edited parts in edited transcribed corpora for this framework. In our proposed method, an automatic alignment between edited transcription and its corresponding utterance is performed, and then a support vector machine based detector is applied to detect edited parts using some features obtained by the automatic alignment. As a result of evaluation on the Japanese National Diet Record, a reasonable result was obtained in speaker-closed condition.