Digital image processing
An automatic lip-synchronization algorithm for synthetic faces
MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Video Rewrite: driving visual speech with audio
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
The String-to-String Correction Problem
Journal of the ACM (JACM)
Computed synchronization for multimedia applications
Computed synchronization for multimedia applications
ACM Computing Surveys (CSUR)
Programs for Digital Signal Processing
Programs for Digital Signal Processing
Cross-media correlation: a case study of navigated hypermedia documents
Proceedings of the tenth ACM international conference on Multimedia
The "Authoring on the Fly" system for automated recording and replay of (tele)presentations
Multimedia Systems - Special issue: Multimedia authoring and presentation techniques
Cross-Domain Approximate String Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Design of a Web-Based Synchronized Multimedia Lecture System for Distance Education
ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Semantic context detection based on hierarchical audio models
MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Human perception of jitter and media synchronization
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
This article investigates the correlations between multimedia objects (particularly speech and text) involved in language lectures in order to design an effective presentation mechanism for web-based learning. The cross-media correlations are classified into implicit relations (retrieved by computing) and explicit relations (recorded during the preprocessing stage). The implicit temporal correlation between speech and text is primarily to help to negotiate supplementary lecture navigations like tele-pointer movement, lips-sync movement, and content scrolling. We propose a speech-text alignment framework, using an iterative algorithm based on local alignment, to probe many-to-one temporal correlations, and not the one-to-one only. The proposed framework is a more practical method for analyzing general language lectures, and the algorithm's time complexity conforms to the best-possible computation cost, O(nm), without introducing additional computation. In addition, we have shown the feasibility of creating vivid presentations by exploiting implicit relations and artificially simulating some explicit media. To facilitate the navigation of integrated multimedia documents, we develop several visualization techniques for describing media correlations, including guidelines for speech-text correlations, visible-automatic scrolling, and levels of detail of timeline, to provide intuitive and easy-to-use random access mechanisms. We evaluated the performance of the analysis method and human perceptions of the synchronized presentation. The overall performance of the analysis method is that about 99.5% of the words analyzed are of a temporal error within 0.5 sec and the subjective evaluation result shows that the synchronized presentation is highly acceptable to human beings.