On anonymizing query logs via token-based hashing
Proceedings of the 16th international conference on World Wide Web
A source code linearization technique for detecting plagiarized programs
Proceedings of the 12th annual SIGCSE conference on Innovation and technology in computer science education
Optimal spaced seeds for faster approximate string matching
Journal of Computer and System Sciences
A framework for condensation-based anonymization of string data
Data Mining and Knowledge Discovery
Information Systems
Privacy-preserving data publishing: A survey of recent developments
ACM Computing Surveys (CSUR)
TwitterMonitor: trend detection over the twitter stream
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
String matching with alphabet sampling
Journal of Discrete Algorithms
Generically extending anonymization algorithms to deal with successive queries
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Data anonymization is an important task for protecting privacy in data mining and processing. With the daily production data through web services and social networks, text anonymization has become an essential technique. In this paper, we present an anonymization method for privacy-enhanced string matching in natural language texts. Given a document comprised of words and separators, our method samples characters in particular positions for each word according to a given seed. String indexing and matching processes are performed on this positionally sampled text; therefore it protects the original text from exposure while retaining the matching statistics of pattern strings. In addition, we define measures for seed performance in data utility and privacy protection, while investigating which seeds provide improved performance in terms of the measures we have defined.