Publishing time-series data under preservation of privacy and distance orders

Authors:
Yang-Sae Moon;Hea-Suk Kim;Sang-Pil Kim;Elisa Bertino
Affiliations:
Department of Computer Science, Kangwon National University, Korea;Department of Computer Science, Kangwon National University, Korea;Department of Computer Science, Kangwon National University, Korea;Department of Computer Science, Purdue University
Venue:
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Year:
2010

Citing 9
Cited 0

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Tools for privacy preserving distributed data mining

ACM SIGKDD Explorations Newsletter
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms

The VLDB Journal — The International Journal on Very Large Data Bases
Time series compressibility and privacy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Noise Control Boundary Image Matching Using Time-Series Moving Average Transform

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Using Anonymized Data for Classification

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of preserving mining accuracy as well as privacy in publishing sensitive time-series data. For example, people with heart disease do not want to disclose their electrocardiogram time-series, but they still allow mining of some accurate patterns from their time-series. Based on this observation, we introduce the related assumptions and requirements.We show that only randomization methods satisfy all assumptions, but even those methods do not satisfy the requirements. Thus, we discuss the randomization-based solutions that satisfy all assumptions and requirements. For this purpose, we use the noise averaging effect of piecewise aggregate approximation (PAA), which may alleviate the problem of destroying distance orders in randomly perturbed time-series. Based on the noise averaging effect, we first propose two naive solutions that use the random data perturbation in publishing time-series while exploiting the PAA distance in computing distances. There is, however, a tradeoff between these two solutions with respect to uncertainty and distance orders. We thus propose two more advanced solutions that take advantages of both naive solutions. Experimental results show that our advanced solutions are superior to the naive solutions.