Searching the audio notebook: keyword search in recorded conversations

Authors:
Peng Yu;Kaijiang Chen;Lie Lu;Frank Seide
Affiliations:
Microsoft Research Asia, Beijing, P.R.C.;Microsoft Research Asia, Beijing, P.R.C.;Microsoft Research Asia, Beijing, P.R.C.;Microsoft Research Asia, Beijing, P.R.C.
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 4
Cited 5

Subword-based approaches for spoken document retrieval

Subword-based approaches for spoken document retrieval
Position specific posterior lattices for indexing speech

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio

HLT '02 Proceedings of the second international conference on Human Language Technology Research
General indexation of weighted automata: application to spoken utterance retrieval

SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004

Towards spoken-document retrieval for the internet: lattice indexing for large-scale web-search architectures

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Statistical lattice-based spoken document retrieval

ACM Transactions on Information Systems (TOIS)
A novel Chinese mandarin speech indexing method based on confusion network using tone information

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Performance analysis for lattice-based speech indexing approaches using words and subword units

IEEE Transactions on Audio, Speech, and Language Processing
Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

MIT's Audio Notebook added great value to the note-taking process by retaining audio recordings, e.g. during lectures or interviews. The key was to provide users ways to quickly and easily access portions of interest in a recording. Several non-speech-recognition based techniques were employed. In this paper we present a system to search directly the audio recordings by key phrases. We have identified the user requirements as accurate ranking of phrase matches, domain independence, and reasonable response time. We address these requirements by a hybrid word/phoneme search in lattices, and a supporting indexing scheme. We will introduce the ranking criterion, a unified hybrid posterior-lattice representation, and the indexing algorithm for hybrid lattices. We present results for five different recording sets, including meetings, telephone conversations, and interviews. Our results show an average search accuracy of 84%, which is dramatically better than a direct search in speech recognition transcripts (less than 40% search accuracy).