A machine learning solution to assess privacy policy completeness: (short paper)

Authors:
Elisa Costante;Yuanhao Sun;Milan Petković;Jerry den Hartog
Affiliations:
TU/e, Eindhoven, Netherlands;TU/e, Eindhoven, Netherlands;Tu/e & Philiphs Research, Eindhoven, Netherlands;TU/e, Eindhoven, Netherlands
Venue:
Proceedings of the 2012 ACM workshop on Privacy in the electronic society
Year:
2012

Citing 9
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Original Contribution: Stacked generalization

Neural Networks
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Chi2: Feature Selection and Discretization of Numeric Attributes

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
Usable security and privacy: a case study of developing privacy management tools

SOUPS '05 Proceedings of the 2005 symposium on Usable privacy and security
A Privacy Assessment Approach for Serviced Oriented Architecture Application

SOSE '06 Proceedings of the Second IEEE International Symposium on Service-Oriented System Engineering
Combining Information Extraction Systems Using Voting and Stacked Generalization

The Journal of Machine Learning Research
P3P Adoption on E-Commerce Web sites: A Survey and Analysis

IEEE Internet Computing
Encyclopedia of Database Systems

Encyclopedia of Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A privacy policy is a legal document, used by websites to communicate how the personal data that they collect will be managed. By accepting it, the user agrees to release his data under the conditions stated by the policy. Privacy policies should provide enough information to enable users to make informed decisions. Privacy regulations support this by specifying what kind of information has to be provided. As privacy policies can be long and difficult to understand, users tend not to read them. Because of this, users generally agree with a policy without knowing what it states and whether aspects important to him are covered at all. In this paper we present a solution to assist the user by providing a structured way to browse the policy content and by automatically assessing the completeness of a policy, i.e. the degree of coverage of privacy categories important to the user. The privacy categories are extracted from privacy regulations, while text categorization and machine learning techniques are used to verify which categories are covered by a policy. The results show the feasibility of our approach; an automatic classifier, able to associate the right category to paragraphs of a policy with an accuracy approximating that obtainable by a human judge, can be effectively created.