Clustering of time-series subsequences is meaningless: implications for previous and future research
Knowledge and Information Systems
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Planetary-scale views on a large instant-messaging network
Proceedings of the 17th international conference on World Wide Web
The convergence of social and technological networks
Communications of the ACM - Remembering Jim Gray
Clustering of time series data-a survey
Pattern Recognition
Inferring relevant social networks from interpersonal communication
Proceedings of the 19th international conference on World wide web
Modeling the structure and evolution of discussion cascades
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
The self-feeding process: a unifying model for communication dynamics in the web
Proceedings of the 22nd international conference on World Wide Web
Hi-index | 0.00 |
The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a model of individual e-mail communication that is sufficiently rich to capture meaningful variability across individuals, while remaining simple enough to be interpretable. We show that the model, a cascading non-homogeneous Poisson process, can be formulated as a double-chain hidden Markov model, allowing us to use an efficient inference algorithm to estimate the model parameters from observed data. We then apply this model to two e-mail data sets consisting of 404 and 6,164 users, respectively, that were collected from two universities in different countries and years. We find that the resulting best-estimate parameter distributions for both data sets are surprisingly similar, indicating that at least some features of communication dynamics generalize beyond specific contexts. We also find that variability of individual behavior over time is significantly less than variability across the population, suggesting that individuals can be classified into persistent "types". We conclude that communication patterns may prove useful as an additional class of attribute data, complementing demographic and network data, for user classification and outlier detection-a point that we illustrate with an interpretable clustering of users based on their inferred model parameters.