On finding lowest common ancestors: simplification and parallelization
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Improved performance of the greedy algorithm for partial cover
Information Processing Letters
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Tracking Information Epidemics in Blogspace
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Inference and Validation of Networks
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Inferring networks of diffusion and influence
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Reconstruction of causal networks by set covering
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
Structure and dynamics of information pathways in online media
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
Inferring causal networks behind observed data is an active area of research with wide applicability to areas such as epidemiology, microbiology and social science. In particular recent research has focused on identifying how information propagates through the Internet. This research has so far only used temporal features of observations, and while reasonable results have been achieved, there is often further information which can be used. In this paper we show that additional features of the observed data can be used very effectively to improve an existing method. Our particular example is one of inferring an underlying network for how text is reused in the Internet, although the general approach is applicable to other inference methods and information sources. We develop a method to identify how a piece of text evolves as it moves through an underlying network and how substring information can be used to narrow down where in the evolutionary process a particular observation at a node lies. Hence we narrow down the number of ways the node could have acquired the infection. Text reuse is detected using a suffix tree which is also used to identify the substring relations between chunks of reused text. We then use a modification of the NetCover method to infer the underlying network. Experimental results -- on both synthetic and real life data -- show that using more information than just timing leads to greater accuracy in the inferred networks.