The persian linguistic based audio-visual data corpus, AVA II, considering coarticulation

Authors:
Azam Bastanfard;Maryam Fazel;Alireza Abdi Kelishami;Mohammad Aghaahmadi
Affiliations:
Information Technology Research Group, Department of Engineering, Islamic Azad University Karaj branch, Iran;Islamic Republic of Iran Broadcast University, Tehran, Iran;Department of Electrical, Computer and IT Engineering, Qazvin Islamic Azad University, Qazvin, Iran;Department of Electrical, Computer and IT Engineering, Qazvin Islamic Azad University, Qazvin, Iran
Venue:
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Year:
2010

Citing 7
Cited 2

Visual Speech Synthesis by Morphing Visemes

International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
The M2VTS Multimodal Face Database (Release 1.00)

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Information fusion and decision cascading for audio-visual speaker recognition based on time-varying stream reliability prediction

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

IEICE - Transactions on Information and Systems
Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus

EURASIP Journal on Applied Signal Processing
A comprehensive audio-visual corpus for teaching sound persian phoneme articulation

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
The BANCA database and evaluation protocol

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication

A novel multimedia educational speech therapy system for hearing impaired children

PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Clustering Persian viseme using phoneme subspace for developing visual speech application

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collecting an audio visual data corpus based on the linguistic rules is an unquestionable, must-take step in order to conduct major research in multimedia fields as AVSR, lip synchronization and visual speech synthesis. Building up a reliable data corpus where it covers all phonemes in all phonemic combinations of a language is a difficult and time consuming task. To partially deal with this problem, in this research, vc, cv and vcv combinations, instead of the entire possible phonemic combinations were used, where they carry the most language information. This paper gives an indication on the new data corpus, capturing 14 respondents. To better perceive coarticulation effect in speech, continuous speech was considered other than isolated and continuous digits. This makes the collection process a more time and cost-saving one, maintaining the efficiency high.