Challenges in mining social network data: processes, privacy, and paradoxes

Authors:
Jon M. Kleinberg
Affiliations:
Cornell University
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 17
Cited 16

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining the network value of customers

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Revealing information while preserving privacy

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining knowledge-sharing sites for viral marketing

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Anti-aliasing on the web

Proceedings of the 13th international conference on World Wide Web
Information diffusion through blogspace

Proceedings of the 13th international conference on World Wide Web
Practical privacy: the SuLQ framework

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Query Incentive Networks

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
The dynamics of viral marketing

EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Privacy via pseudorandom sketches

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Group formation in large social networks: membership, growth, and evolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Structure and evolution of online social networks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography

Proceedings of the 16th international conference on World Wide Web
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography

Research ethics in the facebook era: privacy, anonymity, and oversight

CHI '09 Extended Abstracts on Human Factors in Computing Systems
A brief survey on anonymization techniques for privacy preserving publishing of social network data

ACM SIGKDD Explorations Newsletter
Persona: an online social network with user-defined privacy

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Reputation Cascade Model over Social Connections in Online Social Networks

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Web-Traveler Policies for Images on Social Networks

World Wide Web
Privacy-enhanced public view for social graphs

Proceedings of the 2nd ACM workshop on Social web search and mining
Connected consumption: the hidden networks of consumption

CCNC'09 Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference
A measure of online social networks

COMSNETS'09 Proceedings of the First international conference on COMmunication Systems And NETworks
Comparisons of randomization and K-degree anonymization schemes for privacy preserving social network publishing

Proceedings of the 3rd Workshop on Social Network Mining and Analysis
Frequent tree pattern mining: A survey

Intelligent Data Analysis
A comparison of two different types of online social network from a data privacy perspective

MDAI'11 Proceedings of the 8th international conference on Modeling decisions for artificial intelligence
Privacy threat analysis of social network data

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Measuring query privacy in location-based services

Proceedings of the second ACM conference on Data and Application Security and Privacy
Analysis of on-line social networks represented as graphs --- extraction of an approximation of community structure using sampling

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Using stranger as sensors: temporal and geo-sensitive question answering via social media

Proceedings of the 22nd international conference on World Wide Web
Structure-aware graph anonymization

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The profileration of rich social media, on-line communities, and collectively produced knowledge resources has accelerated the convergence of technological and social networks, producing environments that reflect both the architecture of the underlying information systems and the social structure on their members. In studying the consequences of these developments, we are faced with the opportunity to analyze social network data at unprecedented levels of scale and temporal resolution; this has led to a growing body of research at the intersection of the computing and social sciences. We discuss some of the current challenges in the analysis of large-scale social network data, focusing on two themes in particular: the inference of social processes from data, and the problem of maintaining individual privacy in studies of social networks. While early research on this type of data focused on structural questions, recent work has extended this to consider the social processes that unfold within the networks. Particular lines of investigation have focused on processes in on-line social systems related to communication [1, 22], community formation [2, 8, 16, 23], information-seeking and collective problem-solving [20, 21, 18], marketing [12, 19, 24, 28], the spread of news [3, 17], and the dynamics of popularity [29]. There are a number of fundamental issues, however, for which we have relatively little understanding, including the extent to which the outcomes of these types of social processes are predictable from their early stages (see e.g. [29]), the differences between properties of individuals and properties of aggregate populations in these types of data, and the extent to which similar social phenomena in different domains have uniform underlying explanations. The second theme we pursue is concerned with the problem of privacy. While much of the research on large-scale social systems has been carried out on data that is public, some of the richest emerging sources of social interaction data come from settings such as e-mail, instant messaging, or phone communication in which users have strong expectations of privacy. How can such data be made available to researchers while protecting the privacy of the individuals represented in the data? Many of the standard approaches here are variations on the principle of anonymization - the names of individuals are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed. In recent joint work with Lars Backstrom and Cynthia Dwork, we have identified some fundamental limitations on the power of network anonymization to ensure privacy [7]. In particular, we describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes. The attacks are based on the uniqueness of small random subgraphs embedded in an arbitrary network, using ideas related to those found in arguments from Ramsey theory [6, 14]. Combined with other recent examples of privacy breaches in data containing rich textual or time-series information [9, 26, 27, 30], these results suggest that anonymization contains pitfalls even in very simple settings. In this way, our approach can be seen as a step toward understanding how techniques of privacy-preserving data mining (see e.g. [4, 5, 10, 11, 13, 15, 25] and the references therein) can inform how we think about the protection of eventhe most skeletal social network data.