Exploring many-to-one speech-to-text correlation for web-based language learning

Authors:
Herng-Yow Chen;Sheng-Wei Li
Affiliations:
National Chi Nan University, Taiwan, R.O.C;National Chi Nan University, Taiwan, R.O.C
Venue:
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Year:
2007

Citing 16
Cited 0

`Fisching fore weds': phonetic retrieval of written text in information systems

Program
Digital image processing

Digital image processing
An automatic lip-synchronization algorithm for synthetic faces

MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Video Rewrite: driving visual speech with audio

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The String-to-String Correction Problem

Journal of the ACM (JACM)
Computed synchronization for multimedia applications

Computed synchronization for multimedia applications
Classroom 2000: an experiment with the instrumentation of a living educational environment

IBM Systems Journal
Approximate String Matching

ACM Computing Surveys (CSUR)
Programs for Digital Signal Processing

Programs for Digital Signal Processing
Cross-media correlation: a case study of navigated hypermedia documents

Proceedings of the tenth ACM international conference on Multimedia
The "Authoring on the Fly" system for automated recording and replay of (tele)presentations

Multimedia Systems - Special issue: Multimedia authoring and presentation techniques
Cross-Domain Approximate String Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Design of a Web-Based Synchronized Multimedia Lecture System for Distance Education

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Semantic context detection based on hierarchical audio models

MIR '03 Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
Human perception of jitter and media synchronization

IEEE Journal on Selected Areas in Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article investigates the correlations between multimedia objects (particularly speech and text) involved in language lectures in order to design an effective presentation mechanism for web-based learning. The cross-media correlations are classified into implicit relations (retrieved by computing) and explicit relations (recorded during the preprocessing stage). The implicit temporal correlation between speech and text is primarily to help to negotiate supplementary lecture navigations like tele-pointer movement, lips-sync movement, and content scrolling. We propose a speech-text alignment framework, using an iterative algorithm based on local alignment, to probe many-to-one temporal correlations, and not the one-to-one only. The proposed framework is a more practical method for analyzing general language lectures, and the algorithm's time complexity conforms to the best-possible computation cost, O(nm), without introducing additional computation. In addition, we have shown the feasibility of creating vivid presentations by exploiting implicit relations and artificially simulating some explicit media. To facilitate the navigation of integrated multimedia documents, we develop several visualization techniques for describing media correlations, including guidelines for speech-text correlations, visible-automatic scrolling, and levels of detail of timeline, to provide intuitive and easy-to-use random access mechanisms. We evaluated the performance of the analysis method and human perceptions of the synchronized presentation. The overall performance of the analysis method is that about 99.5% of the words analyzed are of a temporal error within 0.5 sec and the subjective evaluation result shows that the synchronized presentation is highly acceptable to human beings.