Art and Business of Speech Recognition: Creating the Noble Voice

Authors:
Blade Kotelly
Affiliations:
-
Venue:
Art and Business of Speech Recognition: Creating the Noble Voice
Year:
2003

Citing 0
Cited 6

An interactive speech interface for summarizing agile project planning meetings

CHI '06 Extended Abstracts on Human Factors in Computing Systems
An intelligent speech interface for personal assistants applied to knowledge management

Web Intelligence and Agent Systems
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Hidden menu options in automated human - computer telephone dialogues: dissonance in the user's mental model

Behaviour & Information Technology
Bootstrapping spoken dialogue systems by exploiting reusable libraries

Natural Language Engineering
Using reinforcement learning to create communication channel management strategies for diverse users

SLPAT '10 Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

From the Book:Speech-recognition technology is being used increasingly in a variety of over-the-phone applications in transportation, financial, telecommunications, and other industries. Most people by now have experienced at least one automated phone system, in which questions are posed by a computer to callers seeking help or information, and the callers' spoken answers are understood and acted on by that computer. In the best cases, the computer even sounds human! Given the rapid emergence of these applications, the time seems right for a concise overview of the topic—a guide for business managers looking for new and better ways to engage and service their customers, a resource for designers getting started with voice applications (or refreshing their current knowledge), a good read for anyone interested in this exciting technology. Having built applications in many of these industries, I feel well positioned, and eager, to explain important aspects of the art and business of speech recognition, and to illuminate the extraordinary returns a well-designed and well-deployed over-the-phone system can yield—increased revenues, lower costs, customer satisfaction and retention, brand development. Speech-recognition technology can give a company an identifiable and welcome voice—a noble voice, even—and, in this book, I mean to show how.The key to voice-application success, as I demonstrate in a variety of examples, is a well-constructed user interface—interactions between the system and its users that are both pleasing and effective. The best voice interfaces avoid the confusion or annoyance of touch-tone systems and the expense of operators or customerrepresentatives. A good voice interface can solve critical business problems. This book certainly is not meant to be exhaustive in its coverage of speech-recognition technology, or to be academically rigorous in its style. Other fine books delve more deeply and exclusively into particular elements of design, such as brainstorming techniques and usability testing principles. Rather, this book is the product of one experienced, business-focused practitioner—me—talking about what works in this domain, and what does not. I speak from the real-world perspective of a designer who has had to fight with deadlines, work around technological limitations, satisfy and excite the system's users, all while meeting contractual obligations to the client and financial expectations. (That experience will sound familiar to all practitioners!) From this perspective, I convey insights and advice about user-interface design, production, testing, and deployment that I hope will help others plan and build their own successful speech-recognition applications.Readers familiar with other software development processes will see some similarities between the design of systems for over-the-phone speech-recognition systems and the design of systems for other media. This book, however, focuses on that which is unique to designing speech-recognition systems. It is important for client companies and designers alike to understand these unique elements in order to produce a system that works optimally for both the client company and its calling population. To keep the book's focus on more fundamental design concepts, I have chosen not to discuss specific technologies, algorithms, or programming methodologies that may be popular at the moment, but that likely will change with each passing year, or perhaps even soon become obsolete. Thus, I do not cover here VoiceXML, SSML, and certain proprietary software packages currently in use; although important and popular today, these changing standards are less central for understanding the basic principles. There are two principal audiences for this book: People who want to be effective speech-recognition user-interface designers People who want to understand or profit from these systemsProspective designers should find all chapters of the book useful. Other readers—particularly call-center managers, programmers/implementers, and project managers—are likely to benefit most from the chapters that address the design and deployment process, and the ideas that drive the process (Chapters 5, 6, 8 and 9).After you've read this book, you will have a fundamental understanding of what goes into the design, production, testing and deployment of over-the-phone speech-recognition applications. You'll have learned design guidelines, tips, and techniques which help ensure that an application works well and that people enjoy using it. In as much as examples are often the best way to learn, you'll have seen how other designers have dealt with real-life issues to solve real business problems. By addressing the main principles behind the creation of speech-recognition systems, I hope to have shown you the tight connection between the process of solving business problems using speech-recognition technology and the art of designing those systems.PhilosophyNo matter how immersed they be in the minutiae, designers of over-the-phone speech-recognition systems must never lose sight of one, over-arching goal: These systems are made to help people do what they have to do. It doesn't matter if it's mundane (home-banking) or flashy (entertainment/infotainment like a voice-portal), the goal is the same: to help people accomplish their tasks swiftly, easily, and unobtrusively.Every designer should be guided by a philosophy. Having a philosophy gives a designer both a starting position and a compass to point the way—a compass to navigate through the plethora of decisions that must be made along the way. At the very least, having a philosophy enables a designer to answer the Why question at each point in the design process. Why am I writing this question? Why am I using a particular word here in a particular context? Why will this design solve the problem better than another design? When I was about four years old, my mom taught me that if I was confused at all about whether or not to do something, I could simply think about the Golden Rule: "Do unto others as you would have them do unto you." Her point was to make sure that I considered other people's perspectives before I did something that might affect them. I've applied this rule both consciously and subconsciously throughout my life in a variety of situations.When I was a teenager, I became driven by the idea that I should design things—at that point I wasn't sure just what things—that improved people's lives. Eventually, this winding road led me to become a speech-recognition system designer. And when I first approached problems and thought about how to develop the best solutions, I wandered back to the Golden Rule as a way to solve design problems on a variety of levels. I've found that this rule is especially applicable in the design of speech-recognition systems.When designing a system, designers need to put themselves mentally in the place of the people calling into the system. If we can understand what it's like to be those people in their varying moods—happy, angry, confused, rushed, impatient—we can design a system to accommodate them. While empathizing is not always easy to do, we can often satisfy callers by understanding who they are and why they're calling and then decide on the most appropriate way to handle different callers and different situations. Some situations are easy to design for, others present problems that require more work to solve. In both cases knowledge of psychology will aid the designer more than knowledge of technology alone.