Speech emission control using active cancellation

  • Authors:
  • Kazuhiro Kondo;Kiyoshi Nakagawa

  • Affiliations:
  • Department of Electrical Engineering, Faculty of Engineering, Yamagata University, 4-3-16 Jonan, Yonezawa, Yamagata 992-8510, Japan;Department of Electrical Engineering, Faculty of Engineering, Yamagata University, 4-3-16 Jonan, Yonezawa, Yamagata 992-8510, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigated on the possibility of an active cancellation system for unnecessary speech radiation control. Some examples of the intended application of this system are cellular speech cancellation and speech input for recognition-based dictation systems. Both of these applications do not require speech to be radiated into surrounding space, but only into the input microphone, and would benefit if global radiation is controlled. We first show that speech cancellation is possible with a secondary source placed in proximity to the mouth generating linear-predicted phase-inverted speech. However, the prediction must also cover the long delay associated with the acoustic to/from electric conversion, as well as A/D, D/A conversions, and all associated processing, which we found could go up to as long as 3ms. By using LPC predicted samples recursively to predict further samples, we found that prediction with SNR of about 6dB is possible, even with this long delay. The prediction coefficient update is suppressed during this recursion. Lowering the sampling frequency in order to lower the number of predicted samples at the cost of reduced bandwidth further enhances prediction accuracy. At a sampling frequency of 8kHz, speech emission control of about 7dB for female speech and 4dB for male speech was found to be possible. Finally, we experimentally evaluated the proposed active speech control method. Predicted samples of recorded speech was first prepared off line. We then actually played out both the original and the predicted samples simultaneously from two loud speakers. It was found that (1) speech cancellation of up to about 10dB is possible, but is highly speaker dependent, (2) secondary loud speaker should be oriented in the same direction as the primary source, i.e., the mouth. We plan to investigate further to improve prediction accuracy using prediction coefficient extrapolation. A prototype system implementation using DSPs is also planned.