Anonymization of location data does not work: a large-scale measurement study

Authors:
Hui Zang;Jean Bolot
Affiliations:
Sprint, Burlingame, CA, USA;Technicolor, Palo Alto, CA, USA
Venue:
MobiCom '11 Proceedings of the 17th annual international conference on Mobile computing and networking
Year:
2011

Citing 22
Cited 12

Location Privacy in Pervasive Computing

IEEE Pervasive Computing
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Revisiting the uniqueness of simple demographics in the US population

Proceedings of the 5th ACM workshop on Privacy in electronic society
Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking

Proceedings of the 1st international conference on Mobile systems, applications and services
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining call and mobility data to improve paging efficiency in cellular networks

Proceedings of the 13th annual ACM international conference on Mobile computing and networking
The boundary between privacy and utility in data publishing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The cost of privacy: destruction of data-mining utility in anonymized data publishing

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mobile call graphs: beyond power-law and lognormal distributions

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Protecting Privacy in Continuous Location-Tracking Applications

IEEE Security and Privacy
Identification via location-profiling in GSM networks

Proceedings of the 7th ACM workshop on Privacy in the electronic society
Privacy: Theory meets Practice on the Map

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On the tradeoff between privacy and utility in data publishing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
On the Anonymity of Home/Work Location Pairs

Pervasive '09 Proceedings of the 7th International Conference on Pervasive Computing
Privacy for real-time location-based services

SIGSPATIAL Special
A survey of computational location privacy

Personal and Ubiquitous Computing
Measuring serendipity: connecting people, locations and interests in a mobile 3G network

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Inference attacks on location tracks

PERVASIVE'07 Proceedings of the 5th international conference on Pervasive computing
Privacy vulnerability of published anonymous mobility traces

Proceedings of the sixteenth annual international conference on Mobile computing and networking
Unraveling an old cloak: k-anonymity for location privacy

Proceedings of the 9th annual ACM workshop on Privacy in the electronic society
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II

Mining and modeling large scale cell phone data: invited talk

FOMC '11 Proceedings of the 7th ACM ACM SIGACT/SIGMOBILE International Workshop on Foundations of Mobile Computing
Anonymizing geo-social network datasets

Proceedings of the 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS
The loss of location privacy in the cellular age

Communications of the ACM
Human mobility modeling at metropolitan scales

Proceedings of the 10th international conference on Mobile systems, applications, and services
The application of differential privacy to health data

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Preserving location privacy by distinguishing between public and private spaces

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Middleware for location privacy: an overview

Proceedings of the 2012 ACM Research in Applied Computation Symposium
Inferring human mobility patterns from anonymized mobile communication usage

Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia
"Un-googling" publications: the ethics and problems of anonymization

CHI '13 Extended Abstracts on Human Factors in Computing Systems
Exploiting innocuous activity for correlating users across sites

Proceedings of the 22nd international conference on World Wide Web
Active tracking in mobile networks: An in-depth view

Computer Networks: The International Journal of Computer and Telecommunications Networking
Report from Dagstuhl: the liberation of mobile location data and its implications for privacy research

ACM SIGMOBILE Mobile Computing and Communications Review

Quantified Score

Hi-index	0.02

Visualization

Abstract

We examine a very large-scale data set of more than 30 billion call records made by 25 million cell phone users across all 50 states of the US and attempt to determine to what extent anonymized location data can reveal private user information. Our approach is to infer, from the call records, the "top N" locations for each user and correlate this information with publicly-available side information such as census data. For example, the measured "top 2" locations likely correspond to home and work locations, the "top 3" to home, work, and shopping/school/commute path locations. We consider the cases where those "top N" locations are measured with different levels of granularity, ranging from a cell sector to whole cell, zip code, city, county and state. We then compute the anonymity set, namely the number of users uniquely identified by a given set of "top N" locations at different granularity levels. We find that the "top 1" location does not typically yield small anonymity sets. However, the top 2 and top 3 locations do, certainly at the sector or cell-level granularity. We consider a variety of different factors that might impact the size of the anonymity set, for example the distance between the "top N" locations or the geographic environment (rural vs urban). We also examine to what extent specific side information, in particular the size of the user's social network, decrease the anonymity set and therefore increase risks to privacy. Our study shows that sharing anonymized location data will likely lead to privacy risks and that, at a minimum, the data needs to be coarse in either the time domain (meaning the data is collected over short periods of time, in which case inferring the top N locations reliably is difficult) or the space domain (meaning the data granularity is strictly higher than the cell level). In both cases, the utility of the anonymized location data will be decreased, potentially by a significant amount.