A statistical multidimensional humming transcription using phone level hidden Markov models for query by humming systems

  • Authors:
  • Hsuan-Huei Shih;S. S. Narayanan;C.-C. J. Kuo

  • Affiliations:
  • Integrated Media Syst. Center, Southern California Univ., Los Angeles, CA, USA;Integrated Media Syst. Center, Southern California Univ., Los Angeles, CA, USA;Integrated Media Syst. Center, Southern California Univ., Los Angeles, CA, USA

  • Venue:
  • ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new phone level hidden Markov model approach applied to human humming transcription is proposed in this research. A music note has two important attributes, i.e. pitch and duration. The proposed system generates multidimensional humming transcriptions, which contain both pitch and duration information. Query by humming provides a natural means for content-based retrieval from music databases, and this research provides a robust front-end for such an application. The segment of a note in the humming waveform is modeled by phone level hidden Markov models (HMM). The duration of the note segment is then labeled by a duration model. The pitch of the note is modeled by a pitch model using a Gaussian mixture model. Preliminary real-time recognition experiments are carried out with models trained by data obtained from eight human objects, and an overall correct recognition rate of around 84% is demonstrated.