Spoken language processing: where do we go from here?

Authors:
Roger K. Moore
Affiliations:
Dept. Computer Science, University of Sheffield, Sheffield, UK
Venue:
Your Virtual Butler
Year:
2013

Citing 22
Cited 0

An introduction to text-to-speech synthesis

An introduction to text-to-speech synthesis
Speech recognition by machines and humans

Speech Communication
Improvements in Speech Synthesis

Improvements in Speech Synthesis
The Structure of Multimodal Dialogue

The Structure of Multimodal Dialogue
Imitation: a means to enhance learning of a synthetic protolanguage in autonomous robots

Imitation in animals and artifacts
Challenges in adopting speech recognition

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
On Intelligence

On Intelligence
Spoken Dialogue Technology

Spoken Dialogue Technology
Experiences collecting genuine spoken enquiries using WOZ techniques

HLT '91 Proceedings of the workshop on Speech and Natural Language
Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship

Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Reaching over the gap: A review of efforts to link human and automatic speech recognition research

Speech Communication
Spoken language processing: Piecing together the puzzle

Speech Communication
PRESENCE: A Human-Inspired Architecture for Speech-Based Human-Machine Interaction

IEEE Transactions on Computers
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Towards an investigation of speech energetics using 'AnTon': an animatronic model of a human tongue and vocal tract

Connection Science - Language and Robots
Incremental dialogue processing in a micro-domain

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Computer Speech and Language
A case-based approach to dialogue systems

Journal of Experimental & Theoretical Artificial Intelligence
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems

Computer Speech and Language
A prototype for a conversational companion for reminiscing about images

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially, and yet performance appears to be reaching an asymptote that is not only well short of human performance, but which may also be inadequate for many real-world applications. This situation suggests that there may be a fundamental flaw in the underlying architecture of contemporary speech-based systems, and the future direction for research into spoken language processing is currently uncertain. This chapter addresses these issues by stepping outside the familiar domains of speech science and technology, and instead draws inspiration from recent findings in fields of research that are concerned with the neurobiology of living systems in general. In particular, four areas are highlighted: the growing evidence for an intimate relationship between sensor and motor behaviour in living organisms, the power of negative feedback control to accommodate unpredictable disturbances in real-world environments, mechanisms for imitation and mental imagery for learning and modelling, and hierarchical models of temporal memory for predicting future behaviour and anticipating the outcome of events. The chapter shows how these results point towards a novel architecture for speech-based human-machine interaction that blurs the distinction between the core components of a traditional spoken language dialogue system; an architecture in which cooperative and communicative behaviour emerges as a by-product of a model of interaction where the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system. It concludes with a roadmap of technical pre-requisites and desiderata that would seem to be necessary if voice-based interaction with an autonomous agent such as a virtual butler is to become a practical reality.