Spoken language processing: where do we go from here?

  • Authors:
  • Roger K. Moore

  • Affiliations:
  • Dept. Computer Science, University of Sheffield, Sheffield, UK

  • Venue:
  • Your Virtual Butler
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially, and yet performance appears to be reaching an asymptote that is not only well short of human performance, but which may also be inadequate for many real-world applications. This situation suggests that there may be a fundamental flaw in the underlying architecture of contemporary speech-based systems, and the future direction for research into spoken language processing is currently uncertain. This chapter addresses these issues by stepping outside the familiar domains of speech science and technology, and instead draws inspiration from recent findings in fields of research that are concerned with the neurobiology of living systems in general. In particular, four areas are highlighted: the growing evidence for an intimate relationship between sensor and motor behaviour in living organisms, the power of negative feedback control to accommodate unpredictable disturbances in real-world environments, mechanisms for imitation and mental imagery for learning and modelling, and hierarchical models of temporal memory for predicting future behaviour and anticipating the outcome of events. The chapter shows how these results point towards a novel architecture for speech-based human-machine interaction that blurs the distinction between the core components of a traditional spoken language dialogue system; an architecture in which cooperative and communicative behaviour emerges as a by-product of a model of interaction where the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system. It concludes with a roadmap of technical pre-requisites and desiderata that would seem to be necessary if voice-based interaction with an autonomous agent such as a virtual butler is to become a practical reality.