Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Tools for privacy preserving distributed data mining
ACM SIGKDD Explorations Newsletter
IEEE Transactions on Knowledge and Data Engineering
Data Privacy through Optimal k-Anonymization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Deriving private information from randomized data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The VLDB Journal — The International Journal on Very Large Data Bases
Time series compressibility and privacy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Noise Control Boundary Image Matching Using Time-Series Moving Average Transform
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Using Anonymized Data for Classification
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Hi-index | 0.00 |
In this paper we address the problem of preserving mining accuracy as well as privacy in publishing sensitive time-series data. For example, people with heart disease do not want to disclose their electrocardiogram time-series, but they still allow mining of some accurate patterns from their time-series. Based on this observation, we introduce the related assumptions and requirements.We show that only randomization methods satisfy all assumptions, but even those methods do not satisfy the requirements. Thus, we discuss the randomization-based solutions that satisfy all assumptions and requirements. For this purpose, we use the noise averaging effect of piecewise aggregate approximation (PAA), which may alleviate the problem of destroying distance orders in randomly perturbed time-series. Based on the noise averaging effect, we first propose two naive solutions that use the random data perturbation in publishing time-series while exploiting the PAA distance in computing distances. There is, however, a tradeoff between these two solutions with respect to uncertainty and distance orders. We thus propose two more advanced solutions that take advantages of both naive solutions. Experimental results show that our advanced solutions are superior to the naive solutions.