Multi-method audio-based retrieval of multimedia information

Authors:
Mario Malcangi
Affiliations:
Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milano, Italy
Venue:
WSEAS Transactions on Information Science and Applications
Year:
2010

Citing 5
Cited 0

The 'Neural' Phonetic Typewriter

Computer
Audio Retrieval with Fast Relevance Feedback Based on Constrained Fuzzy Clustering and Stored Index Table

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Fuzzy Audio Similarity Measures Based on Spectrum Histograms and Fluctuation Patterns

MUE '07 Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering
A synopsis of sound: image transforms based on the chromaticism of music

WSEAS Transactions on Computers
Automated speech and audio analysis for semantic access to multimedia

SAMT'06 Proceedings of the First international conference on Semantic and Digital Media Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimedia information and embedded systems are two major technological advances that have significantly changed the way people interact with systems and information in recent years. In this context, audio proves to be the most advantageous media for interacting with embedded systems and their content. Advantages include: hands-free operation; unattended interaction; and simple, cheap devices for capture and playback. The use of embedded systems to seek information stored locally or on the web points up several difficulties inherent in the nature of multimedia-information signals. These difficulties are especially evident when palmtop or deeply embedded devices are used for such purposes. Developing a set of digital-signalprocessing-based algorithms for extracting audio information is a primary step toward providing user-friendly access to multimedia information and developing powerful communication interfaces. The algorithms aim to extract semantic and syntactic information from audio signals, including voice. Extracted audio features are employed to access information in multimedia databases, as well as to index it. More extensive, higher-level information, such as audio-source identification (speaker identification) and genre (in the case of music), must be extracted from the audio signal. One basic task involves transforming audio into symbols (e.g. music transformed into a score, speech transformed into text) and transcribing symbols into audio (e.g. score transformed into musical audio, text transformed into speech). The purpose is to search for and access any kind of multimedia information by means of audio. To attain these results, digital audio-processing, digital speechprocessing, and soft-computing methods need to be integrated. Neural networks are used as classifiers and fuzzy logic is used for making smart decisions.