Fundamentals of speech recognition
Fundamentals of speech recognition
Separation of harmonic sound sources using sinusoidal modeling
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Multipitch estimation and sound separation by the spectral smoothness principle
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Musical sound separation based on binary time-frequency masking
EURASIP Journal on Audio, Speech, and Music Processing
Monaural musical sound separation based on pitch and common amplitude modulation
IEEE Transactions on Audio, Speech, and Language Processing
Discriminant feature analysis for music timbre recognition and automatic indexing
MCD'07 Proceedings of the 3rd ECML/PKDD international conference on Mining complex data
Playing in unison in the random forest
SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Detecting multiple pitches (F0s) and segregating musical instrument lines from monaural recordings of contrapuntal polyphonic music into separate tracks is a difficult problem in music signal processing. Applications include audio-to-MIDI conversion, automatic music transcription, and audio enhancement and transformation. Past attempts at separation have been limited to separating two harmonic signals in a contrapuntal duet (Maher, 1990) or several harmonic signals in a single chord (Virtanen and Klapuri, 2001, 2002). Several researchers have attempted polyphonic pitch detection (Klapuri, 2001; Eggink and Brown, 2004a), predominant melody extraction (Goto, 2001; Marolt, 2004; Eggink and Brown, 2004b), and instrument recognition (Eggink and Brown, 2003). Our solution assumes that each instrument is represented as a time-varying harmonic series and that errors can be corrected using prior knowledge of instrument spectra. Fundamental frequencies (F0s) for each time frame are estimated from input spectral data using an Expectation-Maximization (EM) based algorithm with Gaussian distributions used to represent the harmonic series. Collisions (i.e., overlaps) between instrument harmonics, which frequently occur, are predicted from the estimated F0s. The uncollided harmonics are matched to ones contained in a pre-stored spectrum library in order that each F0‘s harmonic series is assigned to the appropriate instrument. Corrupted harmonics are restored using data taken from the library. Finally, each voice is additively resynthesized to a separate track. This algorithm is demonstrated for a monaural signal containing three contrapuntal musical instrument voices with distinct timbres.