Machine Learning for Sequential Data: A Review
Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Robust Real-Time Face Detection
International Journal of Computer Vision
Sensing and modeling human networks
Sensing and modeling human networks
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A social hypertext model for finding community in blogs
Proceedings of the seventeenth conference on Hypertext and hypermedia
Mining communities and their relationships in blogs: A study of online hate groups
International Journal of Human-Computer Studies
A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
IEEE Transactions on Multimedia
Socially-Competent Computing Implementing Social Sensor Design
International Journal of Web-Based Learning and Teaching Technologies
Hi-index | 0.00 |
Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The results are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network.