k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Introduction to Information Retrieval
Introduction to Information Retrieval
Efficient techniques for document sanitization
Proceedings of the 17th ACM conference on Information and knowledge management
Modern Information Retrieval
Towards semantic microaggregation of categorical data for confidential documents
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
A heuristic data-sanitization approach based on TF-IDF
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
On the declassification of confidential documents
MDAI'11 Proceedings of the 8th international conference on Modeling decisions for artificial intelligence
Redactable signatures for independent removal of structure and content
ISPEC'12 Proceedings of the 8th international conference on Information Security Practice and Experience
de-linkability: a privacy-preserving constraint for safely outsourcing multimedia documents
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Hi-index | 0.00 |
In this paper we evaluate the effect of a document sanitization process on a set of information retrieval metrics, in order to measure information loss and risk of disclosure. As an example document set, we use a subset of the Wikileaks Cables, made up of documents relating to five key news items which were revealed by the cables. In order to sanitize the documents we have developed a semi-automatic anonymization process following the guidelines of Executive Order 13526 (2009) of the US Administration, by (i) identifying and anonymizing specific person names and data, and (ii) concept generalization based on WordNet categories, in order to identify words categorized as classified. Finally, we manually revise the text from a contextual point of view to eliminate complete sentences, paragraphs and sections, where necessary. We show that a significant sanitization can be applied, while maintaining the relevance of the documents to the queries corresponding to the five key news items.