Publishing anonymous survey rating data

Authors:
Xiaoxun Sun;Hua Wang;Jiuyong Li;Jian Pei
Affiliations:
Australian Council for Educational Research, Camberwell, Australia;Department of Mathematics Computing, University of Southern Queensland, Toowoomba, Australia;School of Computer and Information Science, University of South Australia, Adelaide, Australia;School of Computing Science, Simon Fraser University, Burnaby, Canada
Venue:
Data Mining and Knowledge Discovery
Year:
2011

Citing 32
Cited 4

Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Coding and Information Theory

Coding and Information Theory
Blocking Anonymity Threats Raised by Frequent Itemset Mining

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Injecting utility into anonymized datasets

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
You are what you say: privacy risks of public mentions

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Workload-aware anonymization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymizing sequential releases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
(α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Towards identity anonymization on graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Anonymity preserving pattern discovery

The VLDB Journal — The International Journal on Very Large Data Bases
Robust De-anonymization of Large Sparse Datasets

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Anonymizing transaction databases for publication

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A brief survey on anonymization techniques for privacy preserving publishing of social network data

ACM SIGKDD Explorations Newsletter
On the Anonymization of Sparse High-Dimensional Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Modeling and Integrating Background Knowledge in Data Anonymization

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
On the tradeoff between privacy and utility in data publishing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Anonymization of set-valued data via top-down, local generalization

Proceedings of the VLDB Endowment
k-anonymous patterns

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

A privacy framework: indistinguishable privacy

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Mapping semantic knowledge for unsupervised text categorisation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
A self-stabilizing algorithm for finding a minimal positive influence dominating set in social networks

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Role-based access control to outsourced data in cloud computing

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the challenges of protecting privacy of individuals in the large public survey rating data in this paper. Recent study shows that personal information in supposedly anonymous movie rating records are de-identified. The survey rating data usually contains both ratings of sensitive and non-sensitive issues. The ratings of sensitive issues involve personal privacy. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., k-anonymity, l-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. We tackle the problem by defining a principle called $${(k,\epsilon)}$$ -anonymity model to protect privacy. Intuitively, the principle requires that, for each transaction t in the given survey rating data T, at least (k 驴 1) other transactions in T must have ratings similar to t, where the similarity is controlled by $${\epsilon}$$ . The $${(k,\epsilon)}$$ -anonymity model is formulated by its graphical representation and a specific graph-anonymisation problem is studied by adopting graph modification with graph theory. Various cases are analyzed and methods are developed to make the updated graph meet $${(k,\epsilon)}$$ requirements. The methods are applied to two real-life data sets to demonstrate their efficiency and practical utility.