Characteristics-based effective applause detection for meeting speech

  • Authors:
  • Yan-Xiong Li;Qian-Hua He;Sam Kwong;Tao Li;Ji-Chen Yang

  • Affiliations:
  • School of Electronic and Information Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510640, Guangdong Province, China;School of Electronic and Information Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510640, Guangdong Province, China;Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong, China;School of Electronic and Information Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510640, Guangdong Province, China;School of Electronic and Information Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510640, Guangdong Province, China

  • Venue:
  • Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.08

Visualization

Abstract

Applause frequently occurs in multi-participants meeting speech. In fact, detecting applause is quite important for meeting speech recognition, semantic inference, highlight extraction, etc. In this paper, we will first study the characteristic differences between applause and speech, such as duration, pitch, spectrogram and occurrence locations. Then, an effective algorithm based on these characteristics is proposed for detecting applause in meeting speech stream. In the algorithm, the non-silence signal segments are first extracted by using voice activity detection. Afterward, applause segments are detected from the non-silence signal segments based on the characteristic differences between applause and speech without using any complex statistical models, such as hidden Markov models. The proposed algorithm can accurately determine the boundaries of applause in meeting speech stream, and is also computationally efficient. In addition, it can extract applause sub-segments from the mixed segments. Experimental evaluations show that the proposed algorithm can achieve satisfactory results in detecting applause of the meeting speech. Precision rate, recall rate, and F1-measure are 94.34%, 98.04%, and 96.15%, respectively. When compared with the traditional algorithm under the same experimental conditions, 3.62% improvement in F1-measure is achieved, and about 35.78% of computational time is saved.