Speechbot: an experimental speech-based search engine formultimedia content on the web

Authors:
J. -M. Van Thong;P. J. Moreno;B. Logan;B. Fidler;K. Maffey;M. Moores
Affiliations:
Compaq Comput. Corp., Cambridge, MA;-;-;-;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2002

Citing 0
Cited 11

VideoQA: question answering on news video

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
The MIT spoken lecture processing project

HLT-Demo '05 Proceedings of HLT/EMNLP on Interactive Demonstrations
The need for virtual information managers in education

Computers & Education
Question-driven segmentation of lecture speech text: Towards intelligent e-learning systems

Journal of the American Society for Information Science and Technology
Cross-lingual audio-to-text alignment for multimedia content management

Decision Support Systems
Word Particles Applied to Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Simultaneous Synchronization of Text and Speech for Broadcast News Subtitling

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III
Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
WAPS: An Audio Program Surveillance System for Large Scale Web Data Stream

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Towards precise and robust automatic synchronization of live speech and its transcripts

Speech Communication
Treemaps to visualise and navigate speech audio

Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the Web transforms from a text-only medium into a more multimedia-rich medium, the need arises to perform searches based on the multimedia content. In this paper, we present an audio and video search engine to tackle this problem. The engine uses speech recognition technology to index spoken audio and video files from the World Wide Web (WWW) when no transcriptions are available. If transcriptions (even imperfect ones) are available, we can also take advantage of them to improve the indexing process. Our engine indexes several thousand talk and news radio shows covering a wide range of topics and speaking styles from a selection of public Web sites with multimedia archives. Our Web site is similar in spirit to normal Web search sites; it contains an index, not the actual multimedia content. The audio from these shows suffers in acoustic quality due to bandwidth limitations, coding, compression, and poor acoustic conditions. Our word error rate (WER) results using appropriately trained acoustic models show remarkable resilience to the high compression, although many factors combine to increase the average WERs over standard broadcast news benchmarks. We show that, even if the transcription is inaccurate, we can still achieve good retrieval performance for typical user queries (77.5%)