UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Speech Communication - Special issue on speech and emotion
Emotional speech: towards a new generation of databases
Speech Communication - Special issue on speech and emotion
Conversational scene analysis
Discrete-time speech signal processing: principles and practice
Discrete-time speech signal processing: principles and practice
Topic and role discovery in social networks
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Mobile context inference using low-cost sensors
LoCA'05 Proceedings of the First international conference on Location- and Context-Awareness
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Speaker change detection with privacy-preserving audio cues
Proceedings of the 2009 international conference on Multimodal interfaces
Probabilistic models for concurrent chatting activity recognition
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
ISWPC'09 Proceedings of the 4th international conference on Wireless pervasive computing
Sensor-Based Human Activity Recognition in a Multi-user Scenario
AmI '09 Proceedings of the European Conference on Ambient Intelligence
Probabilistic models for concurrent chatting activity recognition
ACM Transactions on Intelligent Systems and Technology (TIST)
Recognizing multi-user activities using wearable sensors in a smart home
Pervasive and Mobile Computing
Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems
Hi-index | 0.00 |
In this paper we introduce a new dynamic Bayesian network that separates the speakers and their speaking turns in a multi-person conversation. We protect the speakers' privacy by using only features from which intelligible speech cannot be reconstructed. The model we present combines data from multiple audio streams, segments the streams into speech and silence, separates the different speakers, and detects when other nearby individuals who are not wearing microphones are speaking. No pre-trained speaker specific models are used, so the system can be easily applied in new and different environments. We show promising results in two very different datasets that vary in background noise, microphone placement and quality, and conversational dynamics.