Audio interaction with multimedia information

Authors:
Mario Malcangi
Affiliations:
Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy
Venue:
CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics
Year:
2009

Citing 4
Cited 1

The 'Neural' Phonetic Typewriter

Computer
Audio Retrieval with Fast Relevance Feedback Based on Constrained Fuzzy Clustering and Stored Index Table

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Fuzzy Audio Similarity Measures Based on Spectrum Histograms and Fluctuation Patterns

MUE '07 Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering
Automated speech and audio analysis for semantic access to multimedia

SAMT'06 Proceedings of the First international conference on Semantic and Digital Media Technologies

Toward language-independent text-to-speech synthesis

WSEAS Transactions on Information Science and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interacting with multimedia information stored in systems or on the web points up several difficulties inherent in the signal nature of such information. These difficulties are especially evident when palmtop devices are used for such purposes. Developing and integrating a set of algorithms designed for extracting audio information is a primary step toward providing user-friendly access to multimedia information and developing powerful communication interfaces. Audio has several advantages over other communication media. These include: hands-free operation; unattended interaction; simple, cheap devices for capture and playback. A set of algorithms and processes for extracting semantic and syntactic information from audio signals, including voice, was defined. The extracted information was used to access information in multimedia databases, as well as to index it. More extensive, higher-level information, such as audio-source identification (speaker identification) and genre (in the case of music), must be extracted from the audio signal. One basic task involves transforming audio into symbols (e.g. music transformed into a score, speech transformed into text) and transcribing symbols into audio (e.g. score transformed into musical audio, text transformed into speech). The purpose is to search for and access any kind of multimedia information by means of audio. To attain these results, digital audio processing, digital speech processing, and soft-computing methods need to be integrated.