A scale-free distribution of false positives for a large class of audio similarity measures

Authors:
Jean-Julien Aucouturier;Francois Pachet
Affiliations:
Ikegami Laboratory, Graduate School of Arts and Sciences, The University of Tokyo, Japan;SONY Computer Science Laboratory, Paris, France
Venue:
Pattern Recognition
Year:
2008

Citing 6
Cited 9

Fundamentals of speech recognition

Fundamentals of speech recognition
Self-organizing maps

Self-organizing maps
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Popular music access: the Sony music browser

Journal of the American Society for Information Science and Technology - Music information retrieval
"The way it Sounds": timbre models for analysis and retrieval of music signals

IEEE Transactions on Multimedia

Nearest neighbors in high-dimensional data: the emergence and influence of hubs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Detecting Hubs in Music Audio Based on Network Analysis

Proceedings of the 31st DAGM Symposium on Pattern Recognition
Improving multilabel analysis of music titles: a large-scale validation of the correction approach

IEEE Transactions on Audio, Speech, and Language Processing
From hits to niches?: or how popular artists can bias music recommendation and discovery

Proceedings of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition
On the existence of obstinate results in vector space models

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Limitations of interactive music recommendation based on audio content

Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

The Journal of Machine Learning Research
Multimedia search and retrieval using multimodal annotation propagation and indexing techniques

Image Communication
A unified framework for multimodal retrieval

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

The "bag-of-frames" approach (BOF) to audio pattern recognition models signals as the long-term statistical distribution of their local spectral features, a prototypical implementation of which being Gaussian Mixture Models of Mel-Frequency Cepstrum Coefficients. This approach is the most predominant paradigm to extract high-level descriptions from music signals, such as their instrument, genre or mood, and can also be used to compute direct timbre similarity between songs. However, a recent study by the authors shows that this class of algorithms when applied to music tends to create false positives which are mostly always the same songs regardless of the query. In other words, with such models, there exist songs-which we call hubs-which are irrelevantly close to very many songs. This paper reports on a number of experiments, using implementations on large music databases, aiming at better understanding the nature and causes of such hub songs. We introduce two measures of "hubness", the number of n-occurrences and the mean neighbor angle. We find that in typical music databases, hubs are distributed along a scale-free distribution: non-hub songs are extremely common, and large hubs are extremely rare-but they exist. Moreover, we establish that hubs are not a property of a given modelling strategy (i.e. static vs dynamic, parametric vs non-parametric, etc.) but rather tend to occur with any type of model, however only for data with a given amount of "heterogeneity" (to be defined). This suggests that the existence of hubs could be an important phenomenon which generalizes over the specific problem of music modelling, and indicates a general structural property of an important class of pattern recognition algorithms.