Automatic corpus-based tone and break-index prediction using K-ToBI representation

Authors:
Jin-Seok Lee;Byeongchang Kim;Gary Geunbae Lee
Affiliations:
KOSCOM, Seoul, South Korea;Uiduk University, Kyongju, South Korea;Pohang University of Science & Technology, Pohang, South Korea
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2002

Citing 7
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
The rise/fall/connection model of intonation

Speech Communication
Modeling of intonation for speech synthesis

Modeling of intonation for speech synthesis
Progress in speech synthesis

Progress in speech synthesis
An introduction to text-to-speech synthesis

An introduction to text-to-speech synthesis
Machine Learning

Machine Learning
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing

Chinese prosody generation based on C-ToBI representation for text-to-speech

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Stacking Model-Based Korean Prosodic Phrasing Using Speaker Variability Reduction and Linguistic Feature Engineering

ACM Transactions on Asian Language Information Processing (TALIP)
A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we present a prosody generation architecture based on K-ToBI (Korean Tone and Break Index) representation. ToBI is a multitier representation system based on linguistic knowledge that transcribes events in an utterance. The TTS (Text-To-Speech) system, which adopts ToBI as an intermediate representation, is known to exhibit higher flexibility, modularity, and domain/task portability compared to the direct prosody generation TTS systems. However, for practical-level performance, the cost of corpus preparation is very expensive because the ToBI labeled corpus is constructed manually by many prosody experts, and normally requires large amounts of data for statistical prosody modeling. Unlike previous ToBI-based systems, this article proposes a new method, which transcribes the K-ToBI labels in Korean speech completely automatically. We develop automatic corpus-based K-ToBI labeling tools and prediction methods based on several lexico-syntactic linguistic features for decision-tree induction. We demonstrate the performance of F0 generation from automatically predicted K-ToBI labels, and confirm that the performance is reasonably comparable to state-of-the-art direct prosody generation methods and previous ToBI-based methods.