Differentially private sequential data publication via variable-length n-grams

Authors:
Rui Chen;Gergely Acs;Claude Castelluccia
Affiliations:
Concordia University, Montreal, PQ, Canada;INRIA, Grenoble, France;INRIA, Grenoble, France
Venue:
Proceedings of the 2012 ACM conference on Computer and communications security
Year:
2012

Citing 18
Cited 4

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A learning theory approach to non-interactive database privacy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Privacy Preservation in the Publication of Trajectories

MDM '08 Proceedings of the The Ninth International Conference on Mobile Data Management
Anonymizing moving objects: how to hide a MOB in a crowd?

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On the complexity of differentially private data release: efficient algorithms and hardness results

Proceedings of the forty-first annual ACM symposium on Theory of computing
Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Privacy integrated queries: an extensible platform for privacy-preserving data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy-aware location data publishing

ACM Transactions on Database Systems (TODS)
Optimizing linear counting queries under differential privacy

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Movement Data Anonymity through Generalization

Transactions on Data Privacy
Differentially-private network trace analysis

Proceedings of the ACM SIGCOMM 2010 conference
iReduct: differential privacy with reduced relative errors

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differentially private data release for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Trajectory anonymity in publishing personal mobility data

ACM SIGKDD Explorations Newsletter
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Our data, ourselves: privacy via distributed noise generation

EUROCRYPT'06 Proceedings of the 24th annual international conference on The Theory and Applications of Cryptographic Techniques
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography
Privacy-preserving trajectory data publishing by local suppression

Information Sciences: an International Journal

A two-phase algorithm for mining sequential patterns with differential privacy

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient Time-Stamped Event Sequence Anonymization

ACM Transactions on the Web (TWEB)
Mining frequent patterns with differential privacy

Proceedings of the VLDB Endowment
Privacy-preserving publication of provenance workflows

Proceedings of the 4th ACM conference on Data and application security and privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential data is being increasingly used in a variety of applications. Publishing sequential data is of vital importance to the advancement of these applications. However, as shown by the re-identification attacks on the AOL and Netflix datasets, releasing sequential data may pose considerable threats to individual privacy. Recent research has indicated the failure of existing sanitization techniques to provide claimed privacy guarantees. It is therefore urgent to respond to this failure by developing new schemes with provable privacy guarantees. Differential privacy is one of the only models that can be used to provide such guarantees. Due to the inherent sequentiality and high-dimensionality, it is challenging to apply differential privacy to sequential data. In this paper, we address this challenge by employing a variable-length n-gram model, which extracts the essential information of a sequential database in terms of a set of variable-length n-grams. Our approach makes use of a carefully designed exploration tree structure and a set of novel techniques based on the Markov assumption in order to lower the magnitude of added noise. The published n-grams are useful for many purposes. Furthermore, we develop a solution for generating a synthetic database, which enables a wider spectrum of data analysis tasks. Extensive experiments on real-life datasets demonstrate that our approach substantially outperforms the state-of-the-art techniques.