Cross-lingual audio-to-text alignment for multimedia content management

Authors:
Dau-Cheng Lyu;Ren-Yuan Lyu;Yuang-Chin Chiang;Chun-Nan Hsu
Affiliations:
Department of Electrical Engineering, Chang Gung University, Taiwan and Institute of Information Science, Academia Sinica, Taiwan;Department of Computer Science and Information Engineering, Chang Gung University, Taiwan;Institute of statistics, National Tsing Hua University, Taiwan;Institute of Information Science, Academia Sinica, Taiwan
Venue:
Decision Support Systems
Year:
2008

Citing 9
Cited 1

Fundamentals of speech recognition

Fundamentals of speech recognition
Audio Feature Extraction and Analysis for Scene Segmentation and Classification

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A robust audio classification and segmentation method

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Automatic information extraction from semi-structured Web pages by pattern discovery

Decision Support Systems - Web retrieval and mining
The Segmentation and Classification of Story Boundaries in News Video

Proceedings of the IFIP TC2/WG2.6 Sixth Working Conference on Visual Database Systems: Visual and Multimedia Information Management
Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

ACM Transactions on Asian Language Information Processing (TALIP)
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Speechbot: an experimental speech-based search engine formultimedia content on the web

IEEE Transactions on Multimedia

Building a term suggestion and ranking system based on a probabilistic analysis model and a semantic analysis graph

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a content management problem in situations where we have a collection of spoken documents in audio stream format in one language and a collection of related text documents in another. In our case, we have a huge digital archive of audio broadcast news in Taiwanese, but its transcriptions are unavailable. Meanwhile, we have a collection of related text-based news stories, but they are written in Chinese characters. Due to the lack of a standard written form for Taiwanese, manual transcription of spoken documents is prohibitively expensive, and automatic transcription by speech recognition is infeasible because of its poor performance for Taiwanese spontaneous speech. We present an approximate solution by aligning Taiwanese spoken documents with related text documents in Mandarin. The idea is to take advantage of the abundance of Mandarin text documents available in our application to compensate for the limitations of speech recognition systems. Experimental results show that even though our speech recognizer for spontaneous Taiwanese performs poorly, our approach still achieve a high (82.5%) alignment accuracy for sufficient for content management.