Nearest-neighbor automatic sound annotation with a WordNet taxonomy

Authors:
Pedro Cano;Markus Koppenberger;Sylvain Le Groux;Julien Ricard;Nicolas Wack;Perfecto Herrera
Affiliations:
Music Technology Group, Institut Universitari de l'Audiovisual, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Institut Universitari de l'Audiovisual, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Institut Universitari de l'Audiovisual, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Institut Universitari de l'Audiovisual, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Institut Universitari de l'Audiovisual, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Institut Universitari de l'Audiovisual, Universitat Pompeu Fabra, Barcelona, Spain
Venue:
Journal of Intelligent Information Systems - Special issue: Intelligent multimedia applications
Year:
2005

Citing 7
Cited 4

WordNet: a lexical database for English

Communications of the ACM
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques

ICMAI '02 Proceedings of the Second International Conference on Music and Artificial Intelligence
Sound-source recognition: a theory and computational model

Sound-source recognition: a theory and computational model
Matching words and pictures

The Journal of Machine Learning Research
General sound classification and similarity in MPEG-7

Organised Sound

Development of the database for environmental sound research and application (DESRA): design, functionality, and retrieval considerations

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on environmental sound synthesis, processing, and retrieval
Ecological acoustics perspective for content-based retrieval of environmental sounds

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on environmental sound synthesis, processing, and retrieval
Active learning of custom sound taxonomies in unstructured audio data

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Just-in-time adaptive similarity component analysis in nonstationary environments

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sound engineers need to access vast collections of sound effects for their film and video productions. Sound effects providers rely on text-retrieval techniques to give access to their collections. Currently, audio content is annotated manually, which is an arduous task. Automatic annotation methods, normally fine-tuned to reduced domains such as musical instruments or limited sound effects taxonomies, are not mature enough for labeling with great detail any possible sound. A general sound recognition tool would require first, a taxonomy that represents the world and, second, thousands of classifiers, each specialized in distinguishing little details. We report experimental results on a general sound annotator. To tackle the taxonomy definition problem we use WordNet, a semantic network that organizes real world knowledge. In order to overcome the need of a huge number of classifiers to distinguish many different sound classes, we use a nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts. A 30% concept prediction is achieved on a database of over 50,000 sounds and over 1600 concepts.