Privacy-enhanced string matching with wordwise positional sampling

Authors:
Sung-Hwan Kim;Dae-Geon Kwon;Hwan-Gue Cho
Affiliations:
Pusan National University, Busan, South Korea;Pusan National University, Busan, South Korea;Pusan National University, Busan, South Korea
Venue:
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Year:
2014

Citing 11
Cited 0

On anonymizing query logs via token-based hashing

Proceedings of the 16th international conference on World Wide Web
A source code linearization technique for detecting plagiarized programs

Proceedings of the 12th annual SIGCSE conference on Innovation and technology in computer science education
Optimal spaced seeds for faster approximate string matching

Journal of Computer and System Sciences
A framework for condensation-based anonymization of string data

Data Mining and Knowledge Discovery
On social Web sites

Information Systems
Fast and accurate long-read alignment with Burrows–Wheeler transform

Bioinformatics
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
TwitterMonitor: trend detection over the twitter stream

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Word-based self-indexes for natural language text

ACM Transactions on Information Systems (TOIS)
String matching with alphabet sampling

Journal of Discrete Algorithms
Generically extending anonymization algorithms to deal with successive queries

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data anonymization is an important task for protecting privacy in data mining and processing. With the daily production data through web services and social networks, text anonymization has become an essential technique. In this paper, we present an anonymization method for privacy-enhanced string matching in natural language texts. Given a document comprised of words and separators, our method samples characters in particular positions for each word according to a given seed. String indexing and matching processes are performed on this positionally sampled text; therefore it protects the original text from exposure while retaining the matching statistics of pattern strings. In addition, we define measures for seed performance in data utility and privacy protection, while investigating which seeds provide improved performance in terms of the measures we have defined.