A two-phase algorithm for mining sequential patterns with differential privacy

Authors:
Luca Bonomi;Li Xiong
Affiliations:
Emory University, Atlanta, GA, USA;Emory University, Atlanta, GA, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 22
Cited 0

Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Scalable sequential pattern mining for biological sequences

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Primitive Operator for Similarity Joins in Data Cleaning

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Privacy Preservation in the Publication of Trajectories

MDM '08 Proceedings of the The Ninth International Conference on Mobile Data Management
Anonymizing moving objects: how to hide a MOB in a crowd?

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Privacy integrated queries: an extensible platform for privacy-preserving data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Movement data anonymity through generalization

Proceedings of the 2nd SIGSPATIAL ACM GIS 2009 International Workshop on Security and Privacy in GIS and LBS
Discovering frequent patterns in sensitive data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
iReduct: differential privacy with reduced relative errors

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Trajectory anonymity in publishing personal mobility data

ACM SIGKDD Explorations Newsletter
Private and Continual Release of Statistics

ACM Transactions on Information and System Security (TISSEC)
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II
Calibrating noise to sensitivity in private data analysis

TCC'06 Proceedings of the Third conference on Theory of Cryptography
Differentially private transit data publication: a case study on the montreal transportation system

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
PrivBasis: frequent itemset mining with differential privacy

Proceedings of the VLDB Endowment
Differentially private sequential data publication via variable-length n-grams

Proceedings of the 2012 ACM conference on Computer and communications security
Frequent grams based embedding for privacy preserving record linkage

Proceedings of the 21st ACM international conference on Information and knowledge management
On differentially private frequent itemset mining

Proceedings of the VLDB Endowment
Privacy-preserving trajectory data publishing by local suppression

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent sequential pattern mining is a central task in many fields such as biology and finance. However, release of these patterns is raising increasing concerns on individual privacy. In this paper, we study the sequential pattern mining problem under the differential privacy framework which provides formal and provable guarantees of privacy. Due to the nature of the differential privacy mechanism which perturbs the frequency results with noise, and the high dimensionality of the pattern space, this mining problem is particularly challenging. In this work, we propose a novel two-phase algorithm for mining both prefixes and substring patterns. In the first phase, our approach takes advantage of the statistical properties of the data to construct a model-based prefix tree which is used to mine prefixes and a candidate set of substring patterns. The frequency of the substring patterns is further refined in the successive phase where we employ a novel transformation of the original data to reduce the perturbation noise. Extensive experiment results using real datasets showed that our approach is effective for mining both substring and prefix patterns in comparison to the state-of-the-art solutions.