Cocktail party processing

  • Authors:
  • DeLiang Wang;Guoning Hu

  • Affiliations:
  • Department of Computer Science and Engineering & Center for Cognitive Science, The Ohio State University, Columbus, OH;Biophysics Program, The Ohio State University, Columbus, OH

  • Venue:
  • WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speech segregation, or the cocktail party problem, has proven to be an extremely challenging problem. This chapter describes a computational auditory scene analysis (CASA) approach to the cocktail party problem. This monaural approach performs auditory segmentation and grouping in a two-dimensional time-frequency representation that encodes proximity in frequency and time, periodicity, amplitude modulation, and onset/offset. In segmentation, our model decomposes the input mixture into contiguous time-frequency segments. Grouping is first performed for voiced speech where detected pitch contours are used to group voiced segments into a target stream and the background. In grouping voiced speech, resolved and unresolved harmonics are dealt with differently. Grouping of unvoiced segments is based on the Bayesian classification of acoustic-phonetic features. This CASA approach has led to major advances towards solving the cocktail party problem.