PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise

Authors:
Sira Gonzalez;Mike Brookes
Affiliations:
Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK;Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 10
Cited 0

A pitch determination and voiced/unvoiced decision algorithm for noisy speech

Speech Communication
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Multi-Pitch Estimation

Multi-Pitch Estimation
Super resolution pitch determination of speech signals

IEEE Transactions on Signal Processing
Discriminative learning for minimum error classification [patternrecognition]

IEEE Transactions on Signal Processing
A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models

IEEE Transactions on Audio, Speech, and Language Processing
Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model

IEEE Transactions on Audio, Speech, and Language Processing
Single and Multiple Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments

IEEE Transactions on Audio, Speech, and Language Processing
Joint High-Resolution Fundamental Frequency and Order Estimation

IEEE Transactions on Audio, Speech, and Language Processing
HMM-Based Multipitch Tracking for Noisy and Reverberant Speech

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present PEFAC, a fundamental frequency estimation algorithm for speech that is able to identify voiced frames and estimate pitch reliably even at negative signal-to-noise ratios. The algorithm combines a normalization stage, to remove channel dependency and to attenuate strong noise components, with a harmonic summing filter applied in the log-frequency power spectral domain, the impulse response of which is chosen to sum the energy of the fundamental frequency harmonics while attenuating smoothly-varying noise components. Temporal continuity constraints are applied to the selected pitch candidates and a voiced speech probability is computed from the likelihood ratio of two classifiers, one for voiced speech and one for unvoiced speech/silence. We compare the performance of our algorithm with that of other widely used algorithms and demonstrate that it performs well in both high and low levels of additive noise.