Ordered Data Set Vectorization for Linear Regression on Data Privacy

Authors:
Pau Medrano-Gracia;Jordi Pont-Tuset;Jordi Nin;Victor Muntés-Mulero
Affiliations:
DAMA-UPC, Computer Architecture Dept., Universitat Politècnica de Catalunya, Campus Nord UPC, C/Jordi Girona 1-3, 08034 Barcelona, (Catalonia, Spain);DAMA-UPC, Computer Architecture Dept., Universitat Politècnica de Catalunya, Campus Nord UPC, C/Jordi Girona 1-3, 08034 Barcelona, (Catalonia, Spain);IIIA, Artificial Intelligence Research Institute, CSIC, Spanish National Research Council, Campus UAB s/n, 08193 Bellaterra (Catalonia, Spain);DAMA-UPC, Computer Architecture Dept., Universitat Politècnica de Catalunya, Campus Nord UPC, C/Jordi Girona 1-3, 08034 Barcelona, (Catalonia, Spain)
Venue:
MDAI '07 Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence
Year:
2007

Citing 5
Cited 1

Security-control methods for statistical databases: a comparative study

ACM Computing Surveys (CSUR)
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Numerical Methods

Numerical Methods
Information preserving statistical obfuscation

Statistics and Computing
Using mahalanobis distance-based record linkage for disclosure risk assessment

PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases

Improving Microaggregation for Complex Record Anonymization

MDAI '08 Sabadell Proceedings of the 5th International Conference on Modeling Decisions for Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many situations demand from publishing data without revealing the confidential information in it. Among several data protection methods proposed in the literature, those based on linear regression are widely used for numerical data. The main objective of these methods is to minimize both the disclosure risk(DR) and the information lost(IL). However, most of these techniques try to protect the non-confidential attributes based on the values of the confidential attributes in the data set. In this situation, when these two sets of attributes are strongly correlated, the possibility of an intruder to reveal confidential data increases, making these methods unsuitable for many typical scenarios. In this paper we propose a new type of methods called LiROP茂戮驴 k methodsthat, based on linear regression, avoid the problems derived from the correlation between attributes in the data set. We propose the vectorization, sorting and partitioning of all values in the attributes to be protected in the data set, breaking the semantics of these attributes inside the record. We present two different protection methods: a synthetic protection method called LiROPs-kand a perturbative method, called LiROPp-k. We show that, when the attributes in the data set are highly correlated, our methods present lower DR than other protection methods based on linear regression.