Multi-pitch Streaming of Harmonic Sound Mixtures

Authors:
Zhiyao Duan; Jinyu Han;Bryan Pardo
Affiliations:
Dept. of Electr. & Comput. Eng., Univ. of Rochester, Rochester, NY, USA;Gracenote, Emeryville, CA, USA;Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 19
Cited 0

Multiple period estimation and pitch perception model

Speech Communication
A sound source identification system for ensemble music based on template adaptation and music stream extraction

Speech Communication
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Signal Processing Methods for Music Transcription

Signal Processing Methods for Music Transcription
Efficient incremental constrained clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A discriminative model for polyphonic piano transcription

EURASIP Journal on Applied Signal Processing
Monaural speech separation and recognition challenge

Computer Speech and Language
Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds

IEEE Transactions on Audio, Speech, and Language Processing
A tandem algorithm for pitch estimation and voiced speech segregation

IEEE Transactions on Audio, Speech, and Language Processing
Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions

IEEE Transactions on Audio, Speech, and Language Processing
Tracking of partials for additive sound synthesis using hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Specmurt Analysis of Polyphonic Music Signals

IEEE Transactions on Audio, Speech, and Language Processing
A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models

IEEE Transactions on Audio, Speech, and Language Processing
A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering

IEEE Transactions on Audio, Speech, and Language Processing
Musical source separation using time-frequency source priors

IEEE Transactions on Audio, Speech, and Language Processing
Single and Multiple Contour Estimation Through Parametric Spectrogram Modeling of Speech in Noisy Environments

IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling

IEEE Transactions on Audio, Speech, and Language Processing
HMM-Based Multipitch Tracking for Noisy and Reverberant Speech

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-pitch analysis of concurrent sound sources is an important but challenging problem. It requires estimating pitch values of all harmonic sources in individual frames and streaming the pitch estimates into trajectories, each of which corresponds to a source. We address the streaming problem for monophonic sound sources. We take the original audio, plus frame-level pitch estimates from any multi-pitch estimation algorithm as inputs, and output a pitch trajectory for each source. Our approach does not require pre-training of source models from isolated recordings. Instead, it casts the problem as a constrained clustering problem, where each cluster corresponds to a source. The clustering objective is to minimize the timbre inconsistency within each cluster. We explore different timbre features for music and speech. For music, harmonic structure and a newly proposed feature called uniform discrete cepstrum (UDC) are found effective; while for speech, MFCC and UDC works well. We also show that timbre-consistency is insufficient for effective streaming. Constraints are imposed on pairs of pitch estimates according to their time-frequency relationships. We propose a new constrained clustering algorithm that satisfies as many constraints as possible while optimizing the clustering objective. We compare the proposed approach with other state-of-the-art supervised and unsupervised multi-pitch streaming approaches that are specifically designed for music or speech. Better or comparable results are shown.