A monotonic statistical machine translation approach to speaking style transformation

Authors:
Graham Neubig;Yuya Akita;Shinsuke Mori;Tatsuya Kawahara
Affiliations:
Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, Japan
Venue:
Computer Speech and Language
Year:
2012

Citing 20
Cited 0

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Finite-state transducers in language and speech processing

Computational Linguistics
Decoding complexity in word-replacement translation models

Computational Linguistics
A speech-first model for repair detection and correction

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Integrating multiple knowledge sources for detection and correction of repairs in human-computer dialog

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Generalized algorithms for constructing statistical language models

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Machine Translation with Inferred Stochastic Finite-State Transducers

Computational Linguistics
Dependency structure analysis and sentence boundary detection in spontaneous Japanese

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Restoring punctuation and capitalization in transcribed speech

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A lexically-driven algorithm for disfluency detection

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Reconstructing spontaneous speech

Reconstructing spontaneous speech
OpenFst: a general and efficient weighted finite-state transducer library

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Formatting time-aligned ASR transcripts for readability

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Statistical transformation of language and pronunciation models for spontaneous speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

IEEE Transactions on Audio, Speech, and Language Processing
Edit disfluency detection and correction using a cleanup language model and an alignment model

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method for automatically transforming faithful transcripts or ASR results into clean transcripts for human consumption using a framework we label speaking style transformation (SST). We perform a detailed analysis of the types of corrections performed by human stenographers when creating clean transcripts, and propose a model that is able to handle the majority of the most common corrections. In particular, the proposed model uses a framework of monotonic statistical machine translation to perform not only the deletion of disfluencies and insertion of punctuation, but also correction of colloquial expressions, insertions of omitted words, and other transformations. We provide a detailed description of the model implementation in the weighted finite state transducer (WFST) framework. An evaluation of the proposed model on both faithful transcripts and speech recognition results of parliamentary and lecture speech demonstrates the effectiveness of the proposed model in performing the wide variety of corrections necessary for creating clean transcripts.