Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System
Journal of VLSI Signal Processing Systems
Realistic mouth synthesis based on shape appearance dependence mapping
Pattern Recognition Letters
A semi-free weighting matrices approach for neutral-type delayed neural networks
Journal of Computational and Applied Mathematics
Correlation based speech-video synchronization
Pattern Recognition Letters
Hi-index | 0.00 |
A new technology is proposed for audio-video synchronization in multimedia applications where talking human faces, either natural or synthetic, are employed for interpersonal communication services, home gaming, advanced multimodal interfaces, interactive entertainment, or in movie production. Facial sequences, in fact, represent an acoustic-visual source characterized by two strongly correlated components: a talking face and the associated speech, whose synchronous presentation must be guaranteed in any multimedia application. Therefore, the exact timing for displaying a video frame or for generating a synthetic facial image has to be supervised by some form of speech analysis performed either as preprocessing before encoding or as postprocessing before presentation. Experimental results are reported on the use of time-delay neural networks (TDNN) for the direct estimation of the visible articulation of the mouth starting from the coherent analysis of acoustic speech. The architectural solution of employing a bank of independent single-output TDNNs has been compared to the alternative solution of using only a single multi-output TDNN. Similarly, two different learning procedures have been applied and compared for training the TDNN, the first based on the classic mean square error (MSE) and the second based on a measure of cross-correlation (CC). The superiority of the system based on multiple single-output TDNNs has been proved as well as the improvements, both in terms of convergence speed and estimation fidelity, achievable through the learning algorithm based on cross-correlation