State aggregation in higher order markov chains for finding online communities

Authors:
Xin Wang;Ata Kabán
Affiliations:
School of Computer Science, The University of Birmingham, Birmingham, UK;School of Computer Science, The University of Birmingham, Birmingham, UK
Venue:
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Year:
2006

Citing 9
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones

Machine Learning
Self-Organization and Identification of Web Communities

Computer
Learning to Probabilistically Identify Authoritative Documents

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Bursty and Hierarchical Structure in Streams

Data Mining and Knowledge Discovery
Model-Based estimation of word saliency in text

DS'06 Proceedings of the 9th international conference on Discovery Science
Deconvolutive clustering of markov states

ECML'06 Proceedings of the 17th European conference on Machine Learning
Web document clustering using hyperlink structures

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop and investigate probabilistic approaches of state clustering in higher-order Markov chains. A direct extension of the Aggregate Markov model to higher orders turns out to be problematic due to the large number of parameters required. However, in many cases, the events in the finite memory are not equally salient in terms of their predictive value. We exploit this to reduce the number of parameters. We use a hidden variable to infer which of the past events is the most predictive and develop two different mixed-order approximations of the higher-order aggregate Markov model. We apply these models to the problem of community identification from event sequences produced through online computer-mediated interactions. Our approach bypasses the limitations of static approaches and offers a flexible modelling tool, able to reveal novel and insightful structural aspects of online interaction dynamics.