Separating voices in polyphonic music: a contig mapping approach

Authors:
Elaine Chew;Xiaodan Wu
Affiliations:
Epstein Department of Industrial and Systems Engineering, University of Southern California, Viterbi School of Engineering, Integrated Media Systems Center, Los Angeles, California;Epstein Department of Industrial and Systems Engineering, University of Southern California, Viterbi School of Engineering, Integrated Media Systems Center, Los Angeles, California
Venue:
CMMR'04 Proceedings of the Second international conference on Computer Music Modeling and Retrieval
Year:
2004

Citing 1
Cited 3

The Cognition of Basic Musical Structures

The Cognition of Basic Musical Structures

Musical sound separation based on binary time-frequency masking

EURASIP Journal on Audio, Speech, and Music Processing
Monaural musical sound separation based on pitch and common amplitude modulation

IEEE Transactions on Audio, Speech, and Language Processing
Automatic melodic and structural analysis of music material for enriched concert related experiences

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Voice separation is a critical component of music information retrieval, music analysis and automated transcription systems. We present a contig mapping approach to voice separation based on perceptual principles. The algorithm runs in O(n2) time, uses only pitch height and event boundaries, and requires no user-defined parameters. The method segments a piece into contigs according to voice count, then reconnects fragments in adjacent contigs using a shortest distance strategy. The order of connection is by distance from maximal voice contigs, where the voice ordering is known. This contig-mapping algorithm has been implemented in VoSA, a Java-based voice separation analyzer software. The algorithm performed well when applied to J. S. Bach's Two- and Three-Part Inventions and the forty-eight Fugues from the Well-Tempered Clavier. We report an overall average fragment consistency of 99.75%, correct fragment connection rate of 94.50% and average voice consistency of 88.98%, metrics which we propose to measure voice separation performance.