A statistical multidimensional humming transcription using phone level hidden Markov models for query by humming systems

Authors:
Hsuan-Huei Shih;S. S. Narayanan;C.-C. J. Kuo
Affiliations:
Integrated Media Syst. Center, Southern California Univ., Los Angeles, CA, USA;Integrated Media Syst. Center, Southern California Univ., Los Angeles, CA, USA;Integrated Media Syst. Center, Southern California Univ., Los Angeles, CA, USA
Venue:
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Year:
2003

Citing 3
Cited 1

Query by humming: musical information retrieval in an audio database

Proceedings of the third ACM international conference on Multimedia
Towards the digital music library: tune retrieval from acoustic input

Proceedings of the first ACM international conference on Digital libraries
Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence

A survey of query-by-humming similarity methods

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new phone level hidden Markov model approach applied to human humming transcription is proposed in this research. A music note has two important attributes, i.e. pitch and duration. The proposed system generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust front-end for such an application. The segment of a note in the humming waveform is modeled by phone level hidden Markov models (HMM). The duration of the note segment is then labeled by a duration model. The pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human objects, and an overall correct recognition rate of around 84% is demonstrated.