Automatic transcription of tabla music

  • Authors:
  • Chris Chafe;Parag Chordia

  • Affiliations:
  • Stanford University;Stanford University

  • Venue:
  • Automatic transcription of tabla music
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of automatic tabla transcription, specifically, the reconstruction of a score-like notation that represents timbral categories and rhythmic values. To that end, a computer-based representation for tabla is proposed that allows for encoding, analysis, and typesetting. A transcription system consisting of modules for onset detection, stroke timbre recognition, and rhythm detection was created. Performance was evaluated on a large database taken from three performers under different recording conditions, containing a total of 16,834 strokes. First, the time-domain signal was segmented using complex-domain thresholding that looked for sudden changes in amplitude and phase discontinuities. 98% of onsets were detected against a 1% false positive rate. Classification of strokes was performed using a maximum a posteriori (MAP) rule with a multivariate normal likelihood distribution (MVN), and using non-parametric techniques such as probabilistic neural networks (PNN), feed-forward neural networks (FFNN), and tree-based classifiers. Two evaluation protocols were used. The first used 10-fold cross validation. The recognition rate averaged over many experiments that contained 10-15 classes was 92% for the MVN, 94% for the FFNN and PNN, and 84% for the tree-based classifier. To test generalization, a more difficult independent evaluation was undertaken in which no test strokes came from the same recording as the training strokes. The average recognition rate over a wide variety of test conditions was 76% for the MVN, 83% for the FFNN, 76% for the PNN, and 66% for the tree-based classifier. To determine rhythmic values for strokes, stroke durations were expressed in terms of the beat period (seconds/beat), which was estimated by taking the auto-correlation of the onset detection function, as well as by a duration histogram method. Quantization of durations was done by rounding to a discrete grid constructed by duple and triple divisions of the beat. Accurate rhythmic notation was demonstrated for five tabla phrases containing a total of 552 rhythmic values. Preliminary results using the three modules in series to create a full transcription system yielded good results on the five examples. Finally we describe challenges and possible solutions for the development of a robust, fully automatic transcription system.