A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Modeling vocal interaction for segmentation in meeting recognition
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Automatic cluster complexity and quantity selection: towards robust speaker diarization
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Juicer: a weighted finite-state transducer speech decoder
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Hi-index | 0.00 |
In this paper we present a sound probabilistic approach to speaker diarization. We use a hybrid framework where a distribution over the number of speakers at each point of a multimodal stream is estimated with a discriminative model. The output of this process is used as input in a generative model that can adapt to a novel test set and perform high accuracy speaker diarization. We manage to deal efficiently with the less common, and therefore harder, segments like silence and multiple speaker parts in a principled probabilistic manner.