A dictionary based approach for robust and syllable-independent audio input transcription for query by humming systems

Authors:
Erdem Unal;Shrikanth Narayanan;Elaine Chew;Panayiotis G. Georgiou;Nathan Dahlin
Affiliations:
University of Southern California, CA;University of Southern California, CA;University of Southern California, CA;University of Southern California, CA;University of Southern California, CA
Venue:
Proceedings of the 1st ACM workshop on Audio and music computing multimedia
Year:
2006

Citing 10
Cited 1

Query by humming: musical information retrieval in an audio database

Proceedings of the third ACM international conference on Multimedia
Towards the digital music library: tune retrieval from acoustic input

Proceedings of the first ACM international conference on Digital libraries
Survey of the state of the art in human language technology

Survey of the state of the art in human language technology
A tool for content based navigation of music

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Musical content-based retrieval: an overview of the Melodiscov approach and system

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
A practical query-by-humming system for a large music database

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
A comparison of melodic database retrieval techniques using sung queries

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Tune Retrieval in the Multimedia Library

Multimedia Tools and Applications
Name that tune: a pilot study in finding a melody from a sung query

Journal of the American Society for Information Science and Technology
A statistical approach to retrieval under user-dependent uncertainty in query-by-humming systems

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval

A novel approach based on fault tolerance and recursive segmentation to query by humming

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transcription from audio to musical representation is a challenging problem for Query by Humming (QBH) systems. In this paper, we propose a two step note transcription process consisting of an algorithm that uses a speech recognizer for note segmentation followed by signal processing for robust location and capture of pitch and duration in the humming audio input. In contrast to most Hidden Markov Model based approaches to QBH systems that model and classify humming into a single universal model, we designed a flexible speech recognizer that allows different types of humming sounds in the input for providing efficient and accurate note segmentation and transcription. We use novel statistical energy and pitch analyses to correct potential insertion and deletion errors to augment the system's performance, and evaluate our algorithm with precision and recall tests. Using a large database previously amassed, we test various system configurations, providing results for note segmentation with and without the proposed augmentations. The augmented system robustly recognizes the location of humming notes with a precision and recall F measure of 0.84. As a second validation, we use the results of the transcription system in melody retrieval and found, for a database of 1000 melodies, a 76% retrieval accuracy with automatically extracted queries, and a 83% retrieval performance with manually transcribed queries. Sensitivity analysis shows that, while it is possible to locate the position of the hummed notes accurately, incorrect segmentation results can have a negative effect in the retrieval performance of the QBH system.