Streaming speech3: a framework for generating and streaming 3D text-to-speech and audio presentations to wireless PDAs as specified using extensions to SMIL

  • Authors:
  • Stuart Goose;Sreedhar Kodlahalli;William Pechter;Rune Hjelsvold

  • Affiliations:
  • Siemens Corporate Research, Inc., Princeton, NJ;Siemens Corporate Research, Inc., Princeton, NJ;Siemens Corporate Research, Inc., Princeton, NJ;Siemens Corporate Research, Inc., Princeton, NJ

  • Venue:
  • Proceedings of the 11th international conference on World Wide Web
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

While monochrome unformatted text and richly colored graphical content are both capable of conveying a message, well designed graphical content has the potential for better engaging the human sensory system. It is our contention that the author of an audio presentation should be afforded the benefit of judiciously exploiting the human aural perceptual ability to deliver content in a more compelling, concise and realistic manner. While contemporary streaming media players and voice browsers share the ability to render content non-textually, neither technology is currently capable of rendering three dimensional media. The contributions described in this paper are proposed 3D audio extensions to SMIL and a server-based framework able to receive a request and, on-demand, process such a SMIL file and dynamically create the multiple simultaneous audio objects, spatialize them in 3D space, multiplex them into a single stereo audio and prepare it for transmission over an audio stream to a mobile device. To the knowledge of the authors, this is the first reported solution for delivering and rendering on a commercially available wireless handheld device a rich 3D audio listening experience as described by a markup language. Naturally, in addition to mobile devices this solution also works with desktop streaming media players.