Subword-based approaches for spoken document retrieval
Subword-based approaches for spoken document retrieval
Position specific posterior lattices for indexing speech
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio
HLT '02 Proceedings of the second international conference on Human Language Technology Research
General indexation of weighted automata: application to spoken utterance retrieval
SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Statistical lattice-based spoken document retrieval
ACM Transactions on Information Systems (TOIS)
A novel Chinese mandarin speech indexing method based on confusion network using tone information
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Performance analysis for lattice-based speech indexing approaches using words and subword units
IEEE Transactions on Audio, Speech, and Language Processing
ACM Transactions on Speech and Language Processing (TSLP)
Hi-index | 0.00 |
MIT's Audio Notebook added great value to the note-taking process by retaining audio recordings, e.g. during lectures or interviews. The key was to provide users ways to quickly and easily access portions of interest in a recording. Several non-speech-recognition based techniques were employed. In this paper we present a system to search directly the audio recordings by key phrases. We have identified the user requirements as accurate ranking of phrase matches, domain independence, and reasonable response time. We address these requirements by a hybrid word/phoneme search in lattices, and a supporting indexing scheme. We will introduce the ranking criterion, a unified hybrid posterior-lattice representation, and the indexing algorithm for hybrid lattices. We present results for five different recording sets, including meetings, telephone conversations, and interviews. Our results show an average search accuracy of 84%, which is dramatically better than a direct search in speech recognition transcripts (less than 40% search accuracy).