Interactive Speech Translation in the Diplomat Project
Machine Translation
A segmental speech coder based on a concatenative TTS
Speech Communication
Algebraic Models of Speech Segment Databases
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Phonetic alignment: speech synthesis-based vs. viterbi-based
Speech Communication
Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data
IEEE Transactions on Visualization and Computer Graphics
Statistical modeling for unit selection in speech synthesis
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Learning to say it well: reranking realizations by predicted synthesis quality
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multisyn: Open-domain unit selection for the Festival speech synthesis system
Speech Communication
Adaptive Concatenative Sound Synthesis and Its Application to Micromontage Composition
Computer Music Journal
The listening room: a speech-based interactive art installation
Proceedings of the 15th international conference on Multimedia
Acoustic speech unit segmentation for concatenative synthesis
Computer Speech and Language
Introduction to digital speech processing
Foundations and Trends in Signal Processing
A Romanian syllable-based text-to-speech system
ISPRA'07 Proceedings of the 6th WSEAS International Conference on Signal Processing, Robotics and Automation
A Romanian syllable-based text-to-speech system
ISPRA'07 Proceedings of the 6th WSEAS International Conference on Signal Processing, Robotics and Automation
Boundary Refining Aiming at Speech Synthesis Applications
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
IDEAS4Games: Building Expressive Virtual Characters for Computer Games
IVA '08 Proceedings of the 8th international conference on Intelligent Virtual Agents
Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Enhancing Animated Agents in an Instrumented Poker Game
KI '08 Proceedings of the 31st annual German conference on Advances in Artificial Intelligence
HMM-Based Speech Synthesis for the Greek Language
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Post-recording tool for instant casting movie system
MM '08 Proceedings of the 16th ACM international conference on Multimedia
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis
IEICE - Transactions on Information and Systems
Regionalized Text-to-Speech Systems: Persona Design and Application Scenarios
Multimodal Signals: Cognitive and Algorithmic Issues
Review: Statistical parametric speech synthesis
Speech Communication
Design of the Test Stimuli for the Evaluation of Concatenation Cost Functions
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Optimization of an image-based talking head system
EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Emphatic visual speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Robust speaker-adaptive HMM-based text-to-speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
Expressive concatenative synthesis by reusing samples from real performance recordings
Computer Music Journal
Unit selection using k-nearest neighbor search for concatenative speech synthesis
Proceedings of the 3rd International Universal Communication Symposium
Enhancing Accessibility of Web Content for the Print-Impaired and Blind People
USAB '09 Proceedings of the 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability for e-Inclusion
Implementation of Three Text to Speech Systems for Kurdish Language
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
On the detection of discontinuities in concatenative speech synthesis
Progress in nonlinear speech processing
Extracting user preferences by GTM for aiGA weight tuning in unit selection text-to-speech synthesis
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Enrich web applications with voice internet persona text-to-speech for anyone, anywhere
HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Emotion conversion based on prosodic unit selection
IEEE Transactions on Audio, Speech, and Language Processing
A dynamic cost weighting framework for unit selection text-to-speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
User Modeling and User-Adapted Interaction
Computer Speech and Language
IEEE Transactions on Audio, Speech, and Language Processing
Photorealistic 2D audiovisual text-to-speech synthesis using active appearance models
Proceedings of the SSPNET 2nd International Symposium on Facial Analysis and Animation
Performance: what does a body know
CHI '11 Extended Abstracts on Human Factors in Computing Systems
Performance: what does a body know?
CHI '11 Extended Abstracts on Human Factors in Computing Systems
Corpus design for a unit selection TtS system with application to Bulgarian
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Two methods for assessing oral reading prosody
ACM Transactions on Speech and Language Processing (TSLP)
A review of personality in voice-based man machine interaction
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: interaction techniques and environments - Volume Part II
Proceedings of the 2011 international conference on Virtual and mixed reality: systems and applications - Volume Part II
Development of syllable-based text to speech synthesis system in Bengali
International Journal of Speech Technology
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Identifying concatenation discontinuities by hierarchical divisive clustering of pitch contours
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
A phonetic analysis of natural laughter, for use in automatic laughter processing systems
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Oscillating statistical moments for speech polarity detection
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
The effects of windowing on the calculation of MFCCs for different types of speech sounds
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Dynamic mapping method based speech driven face animation system
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
A new spectral smoothing algorithm for unit concatenating speech synthesis
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Selecting prosody parameters for unit selection based chinese TTS
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Application of Genetic Algorithm in unit selection for Malay speech synthesis system
Expert Systems with Applications: An International Journal
Nonlinear Speech Modeling and Applications
Motion-driven concatenative synthesis of cloth sounds
ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Evaluation of TTS systems in intelligibility and comprehension tasks
ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Syllable Specific Unit Selection Cost Functions for Text-to-Speech Synthesis
ACM Transactions on Speech and Language Processing (TSLP)
Expressive speech synthesis: a review
International Journal of Speech Technology
Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise
Computer Speech and Language
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Speech polarity determination: A comparative evaluation
Neurocomputing
Hi-index | 0.00 |
One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two methods for training from speech are presented which provide weights which produce more natural speech than can be obtained by hand-tuning.