Approaches of anonymisation of an SMS corpus

  • Authors:
  • Namrata Patel;Pierre Accorsi;Diana Inkpen;Cédric Lopez;Mathieu Roche

  • Affiliations:
  • LIRMM --- CNRS, Univ. Montpellier 2, France;LIRMM --- CNRS, Univ. Montpellier 2, France;Univ. of Ottawa, Canada;Objet Direct --- VISEO, France;LIRMM --- CNRS, Univ. Montpellier 2, France

  • Venue:
  • CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents two anonymisation methods to process an SMS corpus. The first one is based on an unsupervised approach called Seek&Hide. The implemented system uses several dictionaries and rules in order to predict if a SMS needs anonymisation process. The second method is based on a supervised approach using machine learning techniques. We evaluate the two approaches and we propose a way to use them together. Only when the two methods do not agree on their prediction, will the SMS be checked by a human expert. This greatly reduces the cost of anonymising the corpus.