Machine Learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Modern Information Retrieval
Zipf's Law, Music Classification, and Aesthetics
Computer Music Journal
Signal Processing Methods for Music Transcription
Signal Processing Methods for Music Transcription
Data Mining: A Knowledge Discovery Approach
Data Mining: A Knowledge Discovery Approach
Discrete-time speech signal processing: principles and practice
Discrete-time speech signal processing: principles and practice
Power-Law Distributions in Empirical Data
SIAM Review
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study
IEEE Transactions on Multimedia
P-order normal cloud model: walking on the way between gaussian and power law distributions
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Hi-index | 0.00 |
Many sound-related applications use Mel-Frequency Cepstral Coefficients (MFCC) to describe audio timbral content. Most of the research efforts dealing with MFCCs have been focused on the study of different classification and clustering algorithms, the use of complementary audio descriptors, or the effect of different distance measures. The goal of this paper is to focus on the statistical properties of the MFCC descriptor itself. For that purpose, we use a simple encoding process that maps a short-time MFCC vector to a dictionary of binary code-words. We study and characterize the rank-frequency distribution of such MFCC code-words, considering speech, music, and environmental sound sources. We show that, regardless of the sound source, MFCC code-words follow a shifted power-law distribution. This implies that there are a few code-words that occur very frequently and many that happen rarely. We also observe that the inner structure of the most frequent code-words has characteristic patterns. For instance, close MFCC coefficients tend to have similar quantization values in the case of music signals. Finally, we study the rank-frequency distributions of individual music recordings and show that they present the same type of heavy-tailed distribution as found in the large-scale databases. This fact is exploited in two supervised semantic inference tasks: genre and instrument classification. In particular, we obtain similar classification results as the ones obtained by considering all frames in the recordings by just using 50 (properly selected) frames. Beyond this particular example, we believe that the fact that MFCC frames follow a power-law distribution could potentially have important implications for future audio-based applications.