An Adaptive BIC Approach for Robust Speaker Change Detection in Continuous Audio Streams

Authors:
Janez Žibert;Andrej Brodnik;France Mihelič
Affiliations:
Primorska Institute of Natural Sciences and Technology, University of Primorska, Koper, Slovenia 6000;Primorska Institute of Natural Sciences and Technology, University of Primorska, Koper, Slovenia 6000;Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia 1000
Venue:
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Year:
2009

Citing 4
Cited 0

DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Automatic transcription of Broadcast News

Speech Communication - Special issue on automatic transcription of broadcast news data
Evolutive HMM for multi-speaker tracking system

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we focus on an audio segmentation. We present a novel method for robust and accurate detection of acoustic change points in continuous audio streams. The presented segmentation procedure was developed as a part of an audio diarization system for broadcast news audio indexing. In the presented approach, we tried to remove a need for using pre-determined decision-thresholds for detecting of segment boundaries, which are usually the case in the standard segmentation procedures. The proposed segmentation aims to estimate decision-thresholds directly from the currently processed audio data and thus reduces a need for additional threshold tuning from development data. It employs change-detection methods from two well-established audio segmentation approaches based on the Bayesian Information Criterion. Combining methods from both approaches enabled us to adaptively tune boundary-detection thresholds from the underlying processing data. All three segmentation procedures are tested and compared on a broadcast news audio database, where our proposed audio segmentation procedure shows its potential.