Modulation-scale analysis for content identification

Authors:
S. Sukittanon;L.E. Atlas;J.W. Pitton
Affiliations:
Dept. of Electr. Eng., Univ. of Washington, Seattle, WA, USA;-;-
Venue:
IEEE Transactions on Signal Processing - Part II
Year:
2004

Citing 0
Cited 6

Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features

IEEE Transactions on Multimedia
Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification

IEEE Transactions on Audio, Speech, and Language Processing
Distance metric learning for content identification

IEEE Transactions on Information Forensics and Security
Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features

Speech Communication
Identifying the classical music composition of an unknown performance with wavelet dispersion vector and neural nets

Information Sciences: an International Journal
Dynamic musical orchestration using genetic algorithms and a spectro-temporal description of musical instruments

EvoCOMNET'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

For nonstationary signal classification, e.g., speech or music, features are traditionally extracted from a time-shifted, yet short data window. For many applications, these short-term features do not efficiently capture or represent longer term signal variation. Partially motivated by human audition, we overcome the deficiencies of short-term features by employing modulation-scale analysis for long-term feature analysis. Our analysis, which uses time-frequency theory integrated with psychoacoustic results on modulation frequency perception, not only contains short-term information about the signals, but also provides long-term information representing patterns of time variation. This paper describes these features and their normalization. We demonstrate the effectiveness of our long-term features over conventional short-term features in content-based audio identification. A simulated study using a large data set, including nearly 10 000 songs and requiring over a billion audio pairwise comparisons, shows that modulation-scale features improves content identification accuracy substantially, especially when time and frequency distortions are imposed.