Revisiting sequential pattern hiding to enhance utility

Authors:
Aris Gkoulalas-Divanis;Grigorios Loukides
Affiliations:
IBM Research-Zurich, Zurich, Switzerland;Vanderbilt University, Nashville, TN, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 22
Cited 3

A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Protecting Sensitive Knowledge By Data Sanitization

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Privacy Preserving Data Classification with Rotation Perturbation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
An integer programming approach for frequent itemset hiding

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns

Information Systems Research
Mobility, Data Mining and Privacy: Geographic Knowledge Discovery

Mobility, Data Mining and Privacy: Geographic Knowledge Discovery
Anonymity preserving pattern discovery

The VLDB Journal — The International Journal on Very Large Data Bases
Privacy-Preserving Data Mining: Models and Algorithms

Privacy-Preserving Data Mining: Models and Algorithms
Privacy-preserving anonymization of set-valued data

Proceedings of the VLDB Endowment
Hiding Sequences

ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Privacy risks in health databases from aggregate disclosure

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Hiding sensitive knowledge without side effects

Knowledge and Information Systems
Association Rule Hiding for Data Mining

Association Rule Hiding for Data Mining
Hiding Sequential and Spatiotemporal Patterns

IEEE Transactions on Knowledge and Data Engineering
ρ-uncertainty: inference-proof transaction anonymization

Proceedings of the VLDB Endowment
Anonymous Publication of Sensitive Transactional Data

IEEE Transactions on Knowledge and Data Engineering
Hiding classification rules for data sharing with privacy preservation

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Utility-preserving transaction data anonymization with low information loss

Expert Systems with Applications: An International Journal
Utility-guided Clustering-based Transaction Data Anonymization

Transactions on Data Privacy
Utility-maximizing event stream suppression

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequence datasets are encountered in a plethora of applications spanning from web usage analysis to healthcare studies and ubiquitous computing. Disseminating such datasets offers remarkable opportunities for discovering interesting knowledge patterns, but may lead to serious privacy violations if sensitive patterns, such as business secrets, are disclosed. In this work, we consider how to sanitize data to prevent the disclosure of sensitive patterns during sequential pattern mining, while ensuring that the nonsensitive patterns can still be discovered. First, we re-define the problem of sequential pattern hiding to capture the information loss incurred by sanitization in terms of both events' modification (distortion) and lost nonsensitive knowledge patterns (side-effects). Second, we model sequences as graphs and propose two algorithms to solve the problem by operating on the graphs. The first algorithm attempts to sanitize data with minimal distortion, whereas the second focuses on reducing the side-effects. Extensive experiments show that our algorithms outperform the existing solution in terms of data distortion and side-effects and are more efficient.