C4.5: programs for machine learning
C4.5: programs for machine learning
Parsimonious downgrading and decision trees applied to the inference problem
Proceedings of the 1998 workshop on New security paradigms
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Privacy preserving frequent itemset mining
CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Protecting Sensitive Knowledge By Data Sanitization
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
IEEE Transactions on Knowledge and Data Engineering
State-of-the-art in privacy preserving data mining
ACM SIGMOD Record
Template-Based Privacy Preservation in Classification Problems
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A reconstruction-based algorithm for classification rules hiding
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
A Max-Min Approach for Hiding Frequent Itemsets
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Anonymizing Classification Data for Privacy Preservation
IEEE Transactions on Knowledge and Data Engineering
Data reduction approach for sensitive associative classification rule hiding
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Hiding classification rules for data sharing with privacy preservation
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Associative classification rules hiding for privacy preservation
International Journal of Intelligent Information and Database Systems
A rigorous and customizable framework for privacy
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Pufferfish: A framework for mathematical privacy definitions
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
This paper focuses on privacy preservation in classification rule mining. The subject at hand is approached through the proposition of a data perturbation approach for hiding sensitive classification rules in categorical datasets. Such a methodology is absolutely necessary in case the data needs to be published on the web so that it is amply available for public use as opposed to other approaches like output perturbation or cryptographic techniques that restrict the usability of the data in different ways. This methodology is based upon the unique characteristics of sequential covering classification algorithms. It modifies the tuples of sensitive rules of a dataset D in such a way that these are distributed to the "more important" non-sensitive rules. In addition it assures that the tuples belonging to the sensitive rules are assigned to the non-sensitive rules in proportion to their rank in the ruleset. In that way, it is ensured that not only the sensitive rules are hidden but also that the current structure of the ruleset, thus the information value of the dataset, is preserved. Moreover a modification of the basic method which exhibits an alternative distribution procedure is also presented. Finally, a series of experiments are executed in order to evaluate the validity and effectiveness of the proposed approaches against existing similar ones.