Unsupervised Pattern Discovery in Speech

Authors:
A. S. Park;J. R. Glass
Affiliations:
Comput. Sci. & Artificial Intell. Lab., Massachusetts Inst. of Technol., Cambridge, MA;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 10

Variability Tolerant Audio Motif Discovery

MMM '09 Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling
A Computational Model of Language Acquisition: the Emergence of Words

Fundamenta Informaticae - Cognitive Informatics, Cognitive Computing, and Their Denotational Mathematical Foundations (I)
Modelling early language acquisition skills: towards a general statistical learning mechanism

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Summarizing multiple spoken documents: finding evidence from untranscribed audio

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
NLP on spoken documents without ASR

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
From audio recurrences to TV program structuring

AIEMPro '11 Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services
Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

ACM Transactions on Information Systems (TOIS)
A Computational Model of Language Acquisition: the Emergence of Words

Fundamenta Informaticae - Cognitive Informatics, Cognitive Computing, and Their Denotational Mathematical Foundations (I)
Bootstrapping a unified model of lexical and phonetic acquisition

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Joint training of non-negative Tucker decomposition and discrete density hidden Markov models

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., phones or words). Instead, we attempt to discover such an inventory in an unsupervised manner by exploiting the structure of repeating patterns within the speech signal. We show how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multiword phrases. On a corpus of academic lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream.