Correcting for missing data in information cascades

Authors:
Eldar Sadikov;Montserrat Medina;Jure Leskovec;Hector Garcia-Molina
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 10
Cited 7

Mining knowledge-sharing sites for viral marketing

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information diffusion through blogspace

Proceedings of the 13th international conference on World Wide Web
Sampling from large graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph evolution: Densification and shrinking diameters

ACM Transactions on Knowledge Discovery from Data (TKDD)
The dynamics of viral marketing

ACM Transactions on the Web (TWEB)
A measurement-driven analysis of information propagation in the flickr social network

Proceedings of the 18th international conference on World wide web
On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs

Journal of the ACM (JACM)
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Sampling community structure

Proceedings of the 19th international conference on World wide web

Social media analytics: tracking, modeling and predicting the flow of information through networks

Proceedings of the 20th international conference companion on World wide web
Inferring Networks of Diffusion and Influence

ACM Transactions on Knowledge Discovery from Data (TKDD)
Solving the missing node problem using structure and attribute information

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Information diffusion in online social networks: a survey

ACM SIGMOD Record
Spatio-temporal and events based analysis of topic popularity in twitter

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Realtime analysis of information diffusion in social media

Proceedings of the VLDB Endowment
A likelihood-based framework for the analysis of discussion threads

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network diffusion is challenging due to missing data. Even a single missing observation in a sequence of propagation events can significantly alter our inferences about the diffusion process. We address the problem of missing data in information cascades. Specifically, given only a fraction C' of the complete cascade C, our goal is to estimate the properties of the complete cascade C, such as its size or depth. To estimate the properties of C, we first formulate k-tree model of cascades and analytically study its properties in the face of missing data. We then propose a numerical method that given a cascade model and observed cascade C' can estimate properties of the complete cascade C. We evaluate our methodology using information propagation cascades in the Twitter network (70 million nodes and 2 billion edges), as well as information cascades arising in the blogosphere. Our experiments show that the k-tree model is an effective tool to study the effects of missing data in cascades. Most importantly, we show that our method (and the k-tree model) can accurately estimate properties of the complete cascade C even when 90% of the data is missing.