The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
A hybrid rule/model-based finite-state framework for normalizing SMS messages
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
An evaluation of feature sets and sampling techniques for de-identification of medical records
Proceedings of the 1st ACM International Health Informatics Symposium
Hi-index | 0.00 |
This paper presents two anonymisation methods to process an SMS corpus. The first one is based on an unsupervised approach called Seek&Hide. The implemented system uses several dictionaries and rules in order to predict if a SMS needs anonymisation process. The second method is based on a supervised approach using machine learning techniques. We evaluate the two approaches and we propose a way to use them together. Only when the two methods do not agree on their prediction, will the SMS be checked by a human expert. This greatly reduces the cost of anonymising the corpus.