Performance analysis for lattice-based speech indexing approaches using words and subword units

Authors:
Yi-Cheng Pan;Lin-Shan Lee
Affiliations:
MediaTek, Inc., Hsinchu, Taiwan and Graduate Institute of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan;Graduate Institute of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 15
Cited 1

A maximum likelihood approach to continuous speech recognition

Readings in speech recognition
PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Real life information retrieval: a study of user queries on the Web

ACM SIGIR Forum
Subword-based approaches for spoken document retrieval

Subword-based approaches for spoken document retrieval
Chinese word segmentation based on maximum matching and word binding force

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Spoken document retrieval from call-center conversations

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Position specific posterior lattices for indexing speech

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Searching the audio notebook: keyword search in recorded conversations

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Towards spoken-document retrieval for the internet: lattice indexing for large-scale web-search architectures

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Soft indexing of speech content for search in spoken documents

Computer Speech and Language
Vocabulary independent spoken term detection

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing confusion networks for morph-based spoken document retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Improved large vocabulary continuous chinese speech recognition by character-based consensus networks

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Approaches to reduce the effects of OOV queries on indexed spoken audio

IEEE Transactions on Multimedia

Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lattice-based speech indexing approaches are attractive for the combination of short spoken segments, short queries, and low automatic speech recognition (ASR) accuracies, as lattices provide recognition alternatives and therefore tend to compensate for recognition errors. Position-specific posterior lattices (PSPLs) and confusion networks (CNs), two of the most popular lattice-based approaches, both reduce disk space requirements and are more efficient than raw lattices. When PSPLs and CNs are used in a word-based fashion, they cannot handle OOV or rare word queries. In this paper, we propose an efficient approach for the construction of subword-based PSPLs (S-PSPLs) and CNs (S-CNs) and present a comprehensive performance analysis of PSPL and CN structures using both words and subword units, taking into account basic principles and structures, and supported by experimental results on Mandarin Chinese. S-PSPLs and S-CNs are shown to yield significant mean average precision (MAP) improvements over word-based PSPLs and CNs for both out-of-vocabulary (OOV) and in-vocabulary queries while requiring much less disk space for indexing.