Refining causality: who copied from whom?

Authors:
Tristan Mark Snowsill;Nick Fyson;Tijl De Bie;Nello Cristianini
Affiliations:
University of Bristol, Bristol, United Kingdom;University of Bristol, Bristol, United Kingdom;University of Bristol, Bristol, United Kingdom;University of Bristol, Bristol, United Kingdom
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 11
Cited 2

On finding lowest common ancestors: simplification and parallelization

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Improved performance of the greedy algorithm for partial cover

Information Processing Letters
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Directed scale-free graphs

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Tracking Information Epidemics in Blogspace

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Meme-tracking and the dynamics of the news cycle

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Inference and Validation of Networks

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Inferring networks of diffusion and influence

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Reconstruction of causal networks by set covering

ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II

The NetCover algorithm for the reconstruction of causal networks

Neurocomputing
Structure and dynamics of information pathways in online media

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inferring causal networks behind observed data is an active area of research with wide applicability to areas such as epidemiology, microbiology and social science. In particular recent research has focused on identifying how information propagates through the Internet. This research has so far only used temporal features of observations, and while reasonable results have been achieved, there is often further information which can be used. In this paper we show that additional features of the observed data can be used very effectively to improve an existing method. Our particular example is one of inferring an underlying network for how text is reused in the Internet, although the general approach is applicable to other inference methods and information sources. We develop a method to identify how a piece of text evolves as it moves through an underlying network and how substring information can be used to narrow down where in the evolutionary process a particular observation at a node lies. Hence we narrow down the number of ways the node could have acquired the infection. Text reuse is detected using a suffix tree which is also used to identify the substring relations between chunks of reused text. We then use a modification of the NetCover method to infer the underlying network. Experimental results -- on both synthetic and real life data -- show that using more information than just timing leads to greater accuracy in the inferred networks.