Interactive Learning of Spoken Words and Their Meanings Through an Audio-Visual Interface

Authors:
Naoto Iwahashi
Affiliations:
-
Venue:
IEICE - Transactions on Information and Systems
Year:
2008

Citing 3
Cited 0

Adaptive Dialog Based upon Multimodal Language Acquisition

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Learning words from sights and sounds: a computational model

Learning words from sights and sounds: a computational model
Language acquisition through a human-robot interface by combining speech, visual, and behavioral information

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new interactive learning method for spoken word acquisition through human-machine audio-visual interfaces. During the course of learning, the machine makes a decision about whether an orally input word is a word in the lexicon the machine has learned, using both speech and visual cues. Learning is carried out on-line, incrementally, based on a combination of active and unsupervised learning principles. If the machine judges with a high degree of confidence that its decision is correct, it learns the statistical models of the word and a corresponding image category as its meaning in an unsupervised way. Otherwise, it asks the user a question in an active way. The function used to estimate the degree of confidence is also learned adaptively on-line. Experimental results show that the combination of active and unsupervised learning principles enables the machine and the user to adapt to each other, which makes the learning process more efficient.