Semi-synchronous speech and pen input for mobile user interfaces

  • Authors:
  • Koichi Shinoda;Yasushi Watanabe;Kenji Iwata;Yuan Liang;Ryuta Nakagawa;Sadaoki Furui

  • Affiliations:
  • Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku 152-8552, Japan;Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku 152-8552, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes new interfaces using semi-synchronous speech and pen input for mobile environments. A user speaks while writing, and the pen input complements the speech so that recognition performance will be higher than with speech alone. Since the input speed and input information are different between the two modes, speaking and writing, a time lag always exists between them. Therefore, conventional multi-modal recognition algorithms cannot be directly applied to this interface. To tackle this problem, we developed a multi-modal recognition algorithm that can handle this asynchronicity (time-lag) by using a segment-based unification scheme and a method of adapting to the time-lag characteristics of individual users. Five different pen-input interfaces, each of which is assumed to be given for a phrase unit in speech, were evaluated in speech recognition experiments using noisy speech data. The recognition accuracy of the proposed method was higher than that of speech alone in all five interfaces. We also carried out a subjective test to examine the usability of each interface. We found a trade-off between usability and improvement in recognition performance.