Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions

Authors:
Zhiyao Duan;Bryan Pardo;Changshui Zhang
Affiliations:
Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL;Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL;State Key Lab of Intelligent Technologies and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Automation–Tsinghua University, Beijing, China
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 11
Cited 3

Multiple period estimation and pitch perception model

Speech Communication
A sound source identification system for ensemble music based on template adaptation and music stream extraction

Speech Communication
Score following: state of the art and new developments

NIME '03 Proceedings of the 2003 conference on New interfaces for musical expression
A discriminative model for polyphonic piano transcription

EURASIP Journal on Applied Signal Processing
Efficient Bayesian inference for harmonic models via adaptive posterior factorization

Neurocomputing
A new probabilistic spectral pitch estimator: exact and MCMC-approximate strategies

CMMR'04 Proceedings of the Second international conference on Computer Music Modeling and Retrieval
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
Specmurt Analysis of Polyphonic Music Signals

IEEE Transactions on Audio, Speech, and Language Processing
A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering

IEEE Transactions on Audio, Speech, and Language Processing
Single-Channel Speech Separation Using Soft Mask Filtering

IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling

IEEE Transactions on Audio, Speech, and Language Processing

Songs2See and GlobalMusic2One: two applied research projects in music information retrieval at Fraunhofer IDMT

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Multi-pitch Streaming of Harmonic Sound Mixtures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Automatic music transcription: challenges and future directions

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a maximum-likelihood approach to multiple fundamental frequency (F0) estimation for a mixture of harmonic sound sources, where the power spectrum of a time frame is the observation and the F0s are the parameters to be estimated. When defining the likelihood model, the proposed method models both spectral peaks and non-peak regions (frequencies further than a musical quarter tone from all observed peaks). It is shown that the peak likelihood and the non-peak region likelihood act as a complementary pair. The former helps find F0s that have harmonics that explain peaks, while the latter helps avoid F0s that have harmonics in non-peak regions. Parameters of these models are learned from monophonic and polyphonic training data. This paper proposes an iterative greedy search strategy to estimate F0s one by one, to avoid the combinatorial problem of concurrent F0 estimation. It also proposes a polyphony estimation method to terminate the iterative process. Finally, this paper proposes a postprocessing method to refine polyphony and F0 estimates using neighboring frames. This paper also analyzes the relative contributions of different components of the proposed method. It is shown that the refinement component eliminates many inconsistent estimation errors. Evaluations are done on ten recorded four-part J. S. Bach chorales. Results show that the proposed method shows superior F0 estimation and polyphony estimation compared to two state-of-the-art algorithms.