Music/speech classification using high-level features derived from fmri brain imaging

Authors:
Xi Jiang;Tuo Zhang;Xintao Hu;Lie Lu;Junwei Han;Lei Guo;Tianming Liu
Affiliations:
The University of Georgia, Athens, GA, USA;Northwestern Polytechnical University, Xi'an, China;Northwestern Polytechnical University, Xi'an, China;Dolby Laboratories, Beijing, China;Northwestern Polytechnical University, Xi'an, China;Northwestern Polytechnical University, Xi'an, China;The University of Georgia, Athens, GA, USA
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 5
Cited 0

Support-Vector Networks

Machine Learning
Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
A hybrid social-acoustic recommendation system for popular music

Proceedings of the 2007 ACM conference on Recommender systems
Twin Gaussian Processes for Structured Prediction

International Journal of Computer Vision
Bridging low-level features and high-level semantics via fMRI brain imaging for video classification

Proceedings of the international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the availability of large amount of audio tracks through a variety of sources and distribution channels, automatic music/speech classification becomes an indispensable tool in social audio websites and online audio communities. However, the accuracy of current acoustic-based low-level feature classification methods is still rather far from satisfaction. The discrepancy between the limited descriptive power of low-level features and the richness of high-level semantics perceived by the human brain has become the 'bottleneck' problem in audio signal analysis. In this paper, functional magnetic resonance imaging (fMRI) which monitors the human brain's response under the natural stimulus of music/speech listening is used as high-level features in the brain imaging space (BIS). We developed a computational framework to model the relationships between BIS features and low-level features in the training dataset with fMRI scans, predict BIS features of testing dataset without fMRI scans, and use the predicted BIS features for music/speech classification in the application stage. Experimental results demonstrated the significantly improved performance of music/speech classification via predicted BIS features than that via the original low-level features.