A Hierarchical Latent Variable Model for Data Visualization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Exploiting generative models in discriminative classifiers
Proceedings of the 1998 conference on Advances in neural information processing systems II
Temporal granulation and its application to signal analysis
Information Sciences—Informatics and Computer Science: An International Journal
The role of voice quality in communicating emotion, mood and attitude
Speech Communication - Special issue on speech and emotion
Coupled hidden Markov models for complex action recognition
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Dialogue act modeling for automatic tagging and recognition of conversational speech
Computational Linguistics
Automatic Analysis of Multimodal Group Actions in Meetings
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic detection of group functional roles in face to face interactions
Proceedings of the 8th international conference on Multimodal interfaces
Social signals, their function, and automatic analysis: a survey
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Social correlates of turn-taking behavior
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A finite-state turn-taking model for spoken dialog systems
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
A privacy-sensitive approach to modeling multi-person conversations
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Modeling dominance in group conversations using nonverbal activity cues
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Guest editorial: special issue on human computing
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Dominance detection in meetings using easily obtainable features
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Estimating Dominance in Multi-Party Meetings Using Speaker Diarization
IEEE Transactions on Audio, Speech, and Language Processing
Modeling individual and group actions in meetings with layered HMMs
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia
Group Interaction Analysis in Dynamic Context
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Audio-Assisted Movie Dialogue Detection
IEEE Transactions on Circuits and Systems for Video Technology
Multimodal prediction of expertise and leadership in learning groups
Proceedings of the 1st International Workshop on Multimodal Learning Analytics
MPRSS'12 Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction
Hi-index | 0.01 |
In the last few years, a growing attention has been paid to the problem of human-human communication, trying to devise artificial systems able to mediate a conversational setting between two or more people. In this paper, we propose an automatic system based on a generative structure able to classify dialog scenarios. The generative model is composed by integrating a Gaussian mixture model and a (observed) Markovian influence model, and it is fed with a novel low-level acoustic feature termed steady conversational period (SCP). SCPs are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provides a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features, and may be important for predicting the evolution of typical conversational situations in different dialog scenarios. The model has been tested on an extensive set of real, dyadic and multi-person conversational settings, including a recent dyadic dataset and the AMI meeting corpus. Comparative tests are made using conventional acoustic features and classification methods, showing that the proposed scheme provides superior classification performances for all conversational settings in our datasets. Moreover, we prove that our approach is able to characterize the nature of multi-person conversation (namely, the role of the participants) in a very accurate way, thus demonstrating great versatility.