Speech recognition by machines and humans
Speech Communication
Non-negative Matrix Factorization with Sparseness Constraints
The Journal of Machine Learning Research
Language Acquisition: The Emergence of Words from Multimodal Input
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
ACORNS - towards computational modeling of communication and recognition skills
COGINF '07 Proceedings of the 6th IEEE International Conference on Cognitive Informatics
Hi-index | 0.00 |
The discovery of words by young infants involves two interrelated processes: (a) the detection of recurrent word-like acoustic patterns in the speech signal, and (b) cross-modal association between auditory and visual information. This paper describes experimental results obtained by a computational model that simulates these two processes. The model is able to build word-like representations on the basis of multimodal input data (stimuli) without the help of an a priori specified lexicon. Each input stimulus consists of a speech signal accompanied by an abstract visual representation of the concepts referred to in the speech signal. In this paper we investigate how internal representations generalize across speakers. In doing so, we also analyze the cognitive plausibility of the model.