The Computer Music Tutorial
Automatic chord recognition from audio using a supervised HMM trained with audio-from-symbolic data
Proceedings of the 1st ACM workshop on Audio and music computing multimedia
Tonal Description of Polyphonic Audio for Music Content Processing
INFORMS Journal on Computing
Making chroma features more robust to timbre changes
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Structured Prediction Models for Chord Transcription of Music Audio
ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
Towards timbre-invariant audio features for harmony-based music
IEEE Transactions on Audio, Speech, and Language Processing
Simultaneous estimation of chords and musical context from audio
IEEE Transactions on Audio, Speech, and Language Processing
Extracting Predominant Local Pulse Information From Music Recordings
IEEE Transactions on Audio, Speech, and Language Processing
Rethinking Automatic Chord Recognition with Convolutional Neural Networks
ICMLA '12 Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 02
A Minimum Frame Error Criterion for Hidden Markov Model Training
ICMLA '12 Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 02
Hi-index | 0.00 |
Most chord recognition systems share a common architecture comprising two main stages: feature extraction and pattern matching, and two optional sub stages: pre-filtering and post-filtering. Understanding the interaction between these basic components is very important not only for achieving optimal performance, but also for assessing the potential and limitations of the system. Unfortunately, there are no studies that sufficiently evaluate the effects of the different approaches to each processing step and the interactions between these steps. In this paper we attempt to remedy this deficiency by performing a systematic evaluation encompassing a wide variety of techniques used for each processing step. In our study we find that filtering has a significant impact on performance, but providing musical context information in the transition matrix is rendered moot by the need to enforce continuity in the estimations. We discovered that the benefits of using complex chord models can be largely offset by an appropriate choice of features. In addition, the initial performance gap between different features were not fully compensated by any subsequent processing stages.