Approaches of anonymisation of an SMS corpus

Authors:
Namrata Patel;Pierre Accorsi;Diana Inkpen;Cédric Lopez;Mathieu Roche
Affiliations:
LIRMM --- CNRS, Univ. Montpellier 2, France;LIRMM --- CNRS, Univ. Montpellier 2, France;Univ. of Ottawa, Canada;Objet Direct --- VISEO, France;LIRMM --- CNRS, Univ. Montpellier 2, France
Venue:
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Year:
2013

Citing 3
Cited 0

The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A hybrid rule/model-based finite-state framework for normalizing SMS messages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
An evaluation of feature sets and sampling techniques for de-identification of medical records

Proceedings of the 1st ACM International Health Informatics Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents two anonymisation methods to process an SMS corpus. The first one is based on an unsupervised approach called Seek&Hide. The implemented system uses several dictionaries and rules in order to predict if a SMS needs anonymisation process. The second method is based on a supervised approach using machine learning techniques. We evaluate the two approaches and we propose a way to use them together. Only when the two methods do not agree on their prediction, will the SMS be checked by a human expert. This greatly reduces the cost of anonymising the corpus.