Audio interaction with multimedia information

  • Authors:
  • Mario Malcangi

  • Affiliations:
  • Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy

  • Venue:
  • CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Interacting with multimedia information stored in systems or on the web points up several difficulties inherent in the signal nature of such information. These difficulties are especially evident when palmtop devices are used for such purposes. Developing and integrating a set of algorithms designed for extracting audio information is a primary step toward providing user-friendly access to multimedia information and developing powerful communication interfaces. Audio has several advantages over other communication media. These include: hands-free operation; unattended interaction; simple, cheap devices for capture and playback. A set of algorithms and processes for extracting semantic and syntactic information from audio signals, including voice, was defined. The extracted information was used to access information in multimedia databases, as well as to index it. More extensive, higher-level information, such as audio-source identification (speaker identification) and genre (in the case of music), must be extracted from the audio signal. One basic task involves transforming audio into symbols (e.g. music transformed into a score, speech transformed into text) and transcribing symbols into audio (e.g. score transformed into musical audio, text transformed into speech). The purpose is to search for and access any kind of multimedia information by means of audio. To attain these results, digital audio processing, digital speech processing, and soft-computing methods need to be integrated.