Multi-stream Fusion for Speaker Classification

  • Authors:
  • Izhak Shafran

  • Affiliations:
  • Computer Science & Electrical Engineering, OGI School of Science & Engineering, Oregon Health & Science University (OHSU), 20000 NW Walker Rd, Beaverton, OR 97006,

  • Venue:
  • Speaker Classification I
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Accurate detection of speaker traits has clear benefits in improving speech interfaces, finding useful information in multi-media archives, and in medical applications. Humans infer a variety of traits, robustly and effortlessly, from available sources of information, which may include vision and gestures in addition to voice. This paper examines techniques for integrating information from multiple sources, which may be broadly categorized into those in feature space, model space, score space and kernel space. Integration in feature space and model space has been extensively studied in the context of audio-visual literature, and here we focus on score space and kernel space. There are large number of potential schemes for integration in kernel space, and here we examine a particular instance which can integrate both acoustic and lexical information for affect recognition. The example is taken from a widely-deployed real-world application. We compare the kernel-based classifier with other competing techniques and demonstrate how it can provide a general and flexible framework for detecting speaker characteristics.